Skip to content

Commit 62646b5

Browse files
PDEP-1: Purpose and guidelines for pandas enhancement proposals (#47444)
1 parent d6563c5 commit 62646b5

File tree

5 files changed

+229
-38
lines changed

5 files changed

+229
-38
lines changed

web/pandas/about/roadmap.md

Lines changed: 35 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,35 @@ fundamental changes to the project that are likely to take months or
1515
years of developer time. Smaller-scoped items will continue to be
1616
tracked on our [issue tracker](https://github.com/pandas-dev/pandas/issues).
1717

18-
See [Roadmap evolution](#roadmap-evolution) for proposing
19-
changes to this document.
18+
The roadmap is defined as a set of major enhancement proposals named PDEPs.
19+
For more information about PDEPs, and how to submit one, please refer to
20+
[PEDP-1](/pdeps/accepted/0001-puropose-and-guidelines.html).
2021

21-
## Extensibility
22+
## PDEPs
23+
24+
{% for pdep_type in ["Under discussion", "Accepted", "Implemented", "Rejected"] %}
25+
26+
<h3 id="pdeps-{{pdep_type}}">{{ pdep_type.replace("_", " ").capitalize() }}</h3>
27+
28+
<ul>
29+
{% for pdep in pdeps[pdep_type] %}
30+
<li><a href="{{ pdep.url }}">{{ pdep.title }}</a></li>
31+
{% else %}
32+
<li>There are currently no PDEPs with this status</li>
33+
{% endfor %}
34+
</ul>
35+
36+
{% endfor %}
37+
38+
## Roadmap points pending a PDEP
39+
40+
<div class="alert alert-warning" role="alert">
41+
pandas is in the process of moving roadmap points to PDEPs (implemented in
42+
August 2022). During the transition, some roadmap points will exist as PDEPs,
43+
while others will exist as sections below.
44+
</div>
45+
46+
### Extensibility
2247

2348
Pandas `extending.extension-types` allow
2449
for extending NumPy types with custom data types and array storage.
@@ -33,7 +58,7 @@ library, making their behavior more consistent with the handling of
3358
NumPy arrays. We'll do this by cleaning up pandas' internals and
3459
adding new methods to the extension array interface.
3560

36-
## String data type
61+
### String data type
3762

3863
Currently, pandas stores text data in an `object` -dtype NumPy array.
3964
The current implementation has two primary drawbacks: First, `object`
@@ -54,7 +79,7 @@ work, we may need to implement certain operations expected by pandas
5479
users (for example the algorithm used in, `Series.str.upper`). That work
5580
may be done outside of pandas.
5681

57-
## Apache Arrow interoperability
82+
### Apache Arrow interoperability
5883

5984
[Apache Arrow](https://arrow.apache.org) is a cross-language development
6085
platform for in-memory data. The Arrow logical types are closely aligned
@@ -65,7 +90,7 @@ data types within pandas. This will let us take advantage of its I/O
6590
capabilities and provide for better interoperability with other
6691
languages and libraries using Arrow.
6792

68-
## Block manager rewrite
93+
### Block manager rewrite
6994

7095
We'd like to replace pandas current internal data structures (a
7196
collection of 1 or 2-D arrays) with a simpler collection of 1-D arrays.
@@ -92,7 +117,7 @@ See [these design
92117
documents](https://dev.pandas.io/pandas2/internal-architecture.html#removal-of-blockmanager-new-dataframe-internals)
93118
for more.
94119

95-
## Decoupling of indexing and internals
120+
### Decoupling of indexing and internals
96121

97122
The code for getting and setting values in pandas' data structures
98123
needs refactoring. In particular, we must clearly separate code that
@@ -150,7 +175,7 @@ which are actually expected (typically `KeyError`).
150175
and when small differences in behavior are expected (e.g. getting with `.loc` raises for
151176
missing labels, setting still doesn't), they can be managed with a specific parameter.
152177

153-
## Numba-accelerated operations
178+
### Numba-accelerated operations
154179

155180
[Numba](https://numba.pydata.org) is a JIT compiler for Python code.
156181
We'd like to provide ways for users to apply their own Numba-jitted
@@ -162,7 +187,7 @@ window contexts). This will improve the performance of
162187
user-defined-functions in these operations by staying within compiled
163188
code.
164189

165-
## Documentation improvements
190+
### Documentation improvements
166191

167192
We'd like to improve the content, structure, and presentation of the
168193
pandas documentation. Some specific goals include
@@ -177,7 +202,7 @@ pandas documentation. Some specific goals include
177202
subsections of the documentation to make navigation and finding
178203
content easier.
179204

180-
## Performance monitoring
205+
### Performance monitoring
181206

182207
Pandas uses [airspeed velocity](https://asv.readthedocs.io/en/stable/)
183208
to monitor for performance regressions. ASV itself is a fabulous tool,
@@ -197,29 +222,3 @@ We'd like to fund improvements and maintenance of these tools to
197222
<https://pyperf.readthedocs.io/en/latest/system.html>
198223
- Build a GitHub bot to request ASV runs *before* a PR is merged.
199224
Currently, the benchmarks are only run nightly.
200-
201-
## Roadmap Evolution
202-
203-
Pandas continues to evolve. The direction is primarily determined by
204-
community interest. Everyone is welcome to review existing items on the
205-
roadmap and to propose a new item.
206-
207-
Each item on the roadmap should be a short summary of a larger design
208-
proposal. The proposal should include
209-
210-
1. Short summary of the changes, which would be appropriate for
211-
inclusion in the roadmap if accepted.
212-
2. Motivation for the changes.
213-
3. An explanation of why the change is in scope for pandas.
214-
4. Detailed design: Preferably with example-usage (even if not
215-
implemented yet) and API documentation
216-
5. API Change: Any API changes that may result from the proposal.
217-
218-
That proposal may then be submitted as a GitHub issue, where the pandas
219-
maintainers can review and comment on the design. The [pandas mailing
220-
list](https://mail.python.org/mailman/listinfo/pandas-dev) should be
221-
notified of the proposal.
222-
223-
When there's agreement that an implementation would be welcome, the
224-
roadmap should be updated to include the summary and a link to the
225-
discussion issue.

web/pandas/config.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ main:
1111
- pandas_web.Preprocessors.blog_add_posts
1212
- pandas_web.Preprocessors.maintainers_add_info
1313
- pandas_web.Preprocessors.home_add_releases
14+
- pandas_web.Preprocessors.roadmap_pdeps
1415
markdown_extensions:
1516
- toc
1617
- tables
@@ -177,3 +178,5 @@ sponsors:
177178
- name: "Gousto"
178179
url: https://www.gousto.co.uk/
179180
kind: partner
181+
roadmap:
182+
pdeps_path: pdeps
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# PDEP-1: Purpose and guidelines
2+
3+
- Created: 3 August 2022
4+
- Status: Under discussion
5+
- Discussion: [#47444](https://github.com/pandas-dev/pandas/pull/47444)
6+
- Author: [Marc Garcia](https://github.com/datapythonista)
7+
- Revision: 1
8+
9+
## PDEP definition, purpose and scope
10+
11+
A PDEP (pandas enhancement proposal) is a proposal for a **major** change in
12+
pandas, in a similar way as a Python [PEP](https://peps.python.org/pep-0001/)
13+
or a NumPy [NEP](https://numpy.org/neps/nep-0000.html).
14+
15+
Bug fixes and conceptually minor changes (e.g. adding a parameter to a function)
16+
are out of the scope of PDEPs. A PDEP should be used for changes that are not
17+
immediate and not obvious, and are expected to require a significant amount of
18+
discussion and require detailed documentation before being implemented.
19+
20+
PDEP are appropriate for user facing changes, internal changes and organizational
21+
discussions. Examples of topics worth a PDEP could include moving a module from
22+
pandas to a separate repository, a refactoring of the pandas block manager or
23+
a proposal of a new code of conduct.
24+
25+
## PDEP guidelines
26+
27+
### Target audience
28+
29+
A PDEP is a public document available to anyone, but the main stakeholders to
30+
consider when writing a PDEP are:
31+
32+
- The core development team, who will have the final decision on whether a PDEP
33+
is approved or not
34+
- Contributors to pandas and other related projects, and experienced users. Their
35+
feedback is highly encouraged and appreciated, to make sure all points of views
36+
are taken into consideration
37+
- The wider pandas community, in particular users, who may or may not have feedback
38+
on the proposal, but should know and be able to understand the future direction of
39+
the project
40+
41+
### PDEP authors
42+
43+
Anyone can propose a PDEP, but in most cases developers of pandas itself and related
44+
projects are expected to author PDEPs. If you are unsure if you should be opening
45+
an issue or creating a PDEP, it's probably safe to start by
46+
[opening an issue](https://github.com/pandas-dev/pandas/issues/new/choose), which can
47+
be eventually moved to a PDEP.
48+
49+
### Workflow
50+
51+
The possible states of a PDEP are:
52+
53+
- Under discussion
54+
- Accepted
55+
- Implemented
56+
- Rejected
57+
58+
Next is described the workflow that PDEPs can follow.
59+
60+
#### Submitting a PDEP
61+
62+
Proposing a PDEP is done by creating a PR adding a new file to `web/pdeps/`.
63+
The file is a markdown file, you can use `web/pdeps/0001.md` as a reference
64+
for the expected format.
65+
66+
The initial status of a PDEP will be `Status: Under discussion`. This will be changed
67+
to `Status: Accepted` when the PDEP is ready and have the approval of the core team.
68+
69+
#### Accepted PDEP
70+
71+
A PDEP can only be accepted by the core development team, if the proposal is considered
72+
worth implementing. Decisions will be made based on the process detailed in the
73+
[pandas governance document](https://github.com/pandas-dev/pandas-governance/blob/master/governance.md).
74+
In general, more than one approval will be needed before the PR is merged. And
75+
there should not be any `Request changes` review at the time of merging.
76+
77+
Once a PDEP is accepted, any contributions can be made toward the implementation of the PDEP,
78+
with an open-ended completion timeline. Development of pandas is difficult to understand and
79+
forecast, being that the contributors to pandas are a mix of volunteers and developers paid from different sources,
80+
with different priorities. For companies, institutions or individuals with interest in seeing a
81+
PDEP being implemented, or to in general see progress to the pandas roadmap, please check how
82+
you can help in the [contributing page](/contribute.html).
83+
84+
#### Implemented PDEP
85+
86+
Once a PDEP is implemented and available in the main branch of pandas, its
87+
status will be changed to `Status: Implemented`, so there is visibility that the PDEP
88+
is not part of the roadmap and future plans, but a change that has already
89+
happened. The first pandas version in which the PDEP implementation is
90+
available will also be included in the PDEP header with for example
91+
`Implemented: v2.0.0`.
92+
93+
#### Rejected PDEP
94+
95+
A PDEP can be rejected when the final decision is that its implementation is
96+
not in the best interests of the project. Rejected PDEPs are as useful as accepted
97+
PDEPs, since there are discussions that are worth having, and decisions about
98+
changes to pandas being made. They will be merged with `Status: Rejected`, so
99+
there is visibility on what was discussed and what was the outcome of the
100+
discussion. A PDEP can be rejected for different reasons, for example good ideas
101+
that aren't backward-compatible, and the breaking changes aren't considered worth
102+
implementing.
103+
104+
#### Invalid PDEP
105+
106+
For submitted PDEPs that do not contain proper documentation, are out of scope, or
107+
are not useful to the community for any other reason, the PR will be closed after
108+
discussion with the author, instead of merging them as rejected. This is to avoid
109+
adding noise to the list of rejected PDEPs, which should contain documentation as
110+
good as an accepted PDEP, but where the final decision was to not implement the changes.
111+
112+
## Evolution of PDEPs
113+
114+
Most PDEPs aren't expected to change after accepted. Once there is agreement in the changes,
115+
and they are implemented, the PDEP will be only useful to understand why the development happened,
116+
and the details of the discussion.
117+
118+
But in some cases, a PDEP can be updated. For example, a PDEP defining a procedure or
119+
a policy, like this one (PDEP-1). Or cases when after attempting the implementation,
120+
new knowledge is obtained that makes the original PDEP obsolete, and changes are
121+
required. When there are specific changes to be made to the original PDEP, this will
122+
be edited, its `Revision: X` label will be increased by one, and a note will be added
123+
to the `PDEP-N history` section. This will let readers understand that the PDEP has
124+
changed and avoid confusion.
125+
126+
### PDEP-1 History
127+
128+
- 3 August 2022: Initial version

web/pandas/static/css/pandas.css

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,19 @@ h1 {
88
color: #130654;
99
}
1010
h2 {
11-
font-size: 1.45rem;
11+
font-size: 1.8rem;
1212
font-weight: 700;
13-
color: black;
13+
color: #130654;
1414
}
1515
h3 {
1616
font-size: 1.3rem;
1717
font-weight: 600;
1818
color: black;
1919
}
20+
h3 a {
21+
color: black;
22+
text-decoration: underline dotted !important;
23+
}
2024
a {
2125
color: #130654;
2226
}

web/pandas_web.py

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,12 @@
2424
The rest of the items in the file will be added directly to the context.
2525
"""
2626
import argparse
27+
import collections
2728
import datetime
2829
import importlib
2930
import operator
3031
import os
32+
import pathlib
3133
import re
3234
import shutil
3335
import sys
@@ -185,6 +187,61 @@ def home_add_releases(context):
185187
)
186188
return context
187189

190+
@staticmethod
191+
def roadmap_pdeps(context):
192+
"""
193+
PDEP's (pandas enhancement proposals) are not part of the bar
194+
navigation. They are included as lists in the "Roadmap" page
195+
and linked from there. This preprocessor obtains the list of
196+
PDEP's in different status from the directory tree and GitHub.
197+
"""
198+
KNOWN_STATUS = {"Under discussion", "Accepted", "Implemented", "Rejected"}
199+
context["pdeps"] = collections.defaultdict(list)
200+
201+
# accepted, rejected and implemented
202+
pdeps_path = (
203+
pathlib.Path(context["source_path"]) / context["roadmap"]["pdeps_path"]
204+
)
205+
for pdep in sorted(pdeps_path.iterdir()):
206+
if pdep.suffix != ".md":
207+
continue
208+
with pdep.open() as f:
209+
title = f.readline()[2:] # removing markdown title "# "
210+
status = None
211+
for line in f:
212+
if line.startswith("- Status: "):
213+
status = line.strip().split(": ", 1)[1]
214+
break
215+
if status not in KNOWN_STATUS:
216+
raise RuntimeError(
217+
f'PDEP "{pdep}" status "{status}" is unknown. '
218+
f"Should be one of: {KNOWN_STATUS}"
219+
)
220+
html_file = pdep.with_suffix(".html").name
221+
context["pdeps"][status].append(
222+
{
223+
"title": title,
224+
"url": f"/pdeps/{html_file}",
225+
}
226+
)
227+
228+
# under discussion
229+
github_repo_url = context["main"]["github_repo_url"]
230+
resp = requests.get(
231+
"https://api.github.com/search/issues?"
232+
f"q=is:pr is:open label:PDEP repo:{github_repo_url}"
233+
)
234+
if context["ignore_io_errors"] and resp.status_code == 403:
235+
return context
236+
resp.raise_for_status()
237+
238+
for pdep in resp.json()["items"]:
239+
context["pdeps"]["under_discussion"].append(
240+
{"title": pdep["title"], "url": pdep["url"]}
241+
)
242+
243+
return context
244+
188245

189246
def get_callable(obj_as_str: str) -> object:
190247
"""

0 commit comments

Comments
 (0)