PDEP-1: Purpose and guidelines for pandas enhancement proposals (#47444)

datapythonista · web-flow · commit 62646b5495db · 2022-08-03T12:53:58.000+07:00
diff --git a/web/pandas/about/roadmap.md b/web/pandas/about/roadmap.md
@@ -15,10 +15,35 @@ fundamental changes to the project that are likely to take months or
 years of developer time. Smaller-scoped items will continue to be
 tracked on our [issue tracker](https://github.com/pandas-dev/pandas/issues).
 
-See [Roadmap evolution](#roadmap-evolution) for proposing
-changes to this document.
+The roadmap is defined as a set of major enhancement proposals named PDEPs.
+For more information about PDEPs, and how to submit one, please refer to
+[PEDP-1](/pdeps/accepted/0001-puropose-and-guidelines.html).
 
-## Extensibility
+## PDEPs
+
+{% for pdep_type in ["Under discussion", "Accepted", "Implemented", "Rejected"] %}
+
+<h3 id="pdeps-{{pdep_type}}">{{ pdep_type.replace("_", " ").capitalize() }}</h3>
+
+<ul>
+{% for pdep in pdeps[pdep_type] %}
+    <li><a href="{{ pdep.url }}">{{ pdep.title }}</a></li>
+{% else %}
+    <li>There are currently no PDEPs with this status</li>
+{% endfor %}
+</ul>
+
+{% endfor %}
+
+## Roadmap points pending a PDEP
+
+<div class="alert alert-warning" role="alert">
+  pandas is in the process of moving roadmap points to PDEPs (implemented in
+  August 2022). During the transition, some roadmap points will exist as PDEPs,
+  while others will exist as sections below.
+</div>
+
+### Extensibility
 
 Pandas `extending.extension-types` allow
 for extending NumPy types with custom data types and array storage.
@@ -33,7 +58,7 @@ library, making their behavior more consistent with the handling of
 NumPy arrays. We'll do this by cleaning up pandas' internals and
 adding new methods to the extension array interface.
 
-## String data type
+### String data type
 
 Currently, pandas stores text data in an `object` -dtype NumPy array.
 The current implementation has two primary drawbacks: First, `object`
@@ -54,7 +79,7 @@ work, we may need to implement certain operations expected by pandas
 users (for example the algorithm used in, `Series.str.upper`). That work
 may be done outside of pandas.
 
-## Apache Arrow interoperability
+### Apache Arrow interoperability
 
 [Apache Arrow](https://arrow.apache.org) is a cross-language development
 platform for in-memory data. The Arrow logical types are closely aligned
@@ -65,7 +90,7 @@ data types within pandas. This will let us take advantage of its I/O
 capabilities and provide for better interoperability with other
 languages and libraries using Arrow.
 
-## Block manager rewrite
+### Block manager rewrite
 
 We'd like to replace pandas current internal data structures (a
 collection of 1 or 2-D arrays) with a simpler collection of 1-D arrays.
@@ -92,7 +117,7 @@ See [these design
 documents](https://dev.pandas.io/pandas2/internal-architecture.html#removal-of-blockmanager-new-dataframe-internals)
 for more.
 
-## Decoupling of indexing and internals
+### Decoupling of indexing and internals
 
 The code for getting and setting values in pandas' data structures
 needs refactoring. In particular, we must clearly separate code that
@@ -150,7 +175,7 @@ which are actually expected (typically `KeyError`).
 and when small differences in behavior are expected (e.g. getting with `.loc` raises for
 missing labels, setting still doesn't), they can be managed with a specific parameter.
 
-## Numba-accelerated operations
+### Numba-accelerated operations
 
 [Numba](https://numba.pydata.org) is a JIT compiler for Python code.
 We'd like to provide ways for users to apply their own Numba-jitted
@@ -162,7 +187,7 @@ window contexts). This will improve the performance of
 user-defined-functions in these operations by staying within compiled
 code.
 
-## Documentation improvements
+### Documentation improvements
 
 We'd like to improve the content, structure, and presentation of the
 pandas documentation. Some specific goals include
@@ -177,7 +202,7 @@ pandas documentation. Some specific goals include
     subsections of the documentation to make navigation and finding
     content easier.
 
-## Performance monitoring
+### Performance monitoring
 
 Pandas uses [airspeed velocity](https://asv.readthedocs.io/en/stable/)
 to monitor for performance regressions. ASV itself is a fabulous tool,
@@ -197,29 +222,3 @@ We'd like to fund improvements and maintenance of these tools to
     <https://pyperf.readthedocs.io/en/latest/system.html>
 -   Build a GitHub bot to request ASV runs *before* a PR is merged.
     Currently, the benchmarks are only run nightly.
-
-## Roadmap Evolution
-
-Pandas continues to evolve. The direction is primarily determined by
-community interest. Everyone is welcome to review existing items on the
-roadmap and to propose a new item.
-
-Each item on the roadmap should be a short summary of a larger design
-proposal. The proposal should include
-
-1.  Short summary of the changes, which would be appropriate for
-    inclusion in the roadmap if accepted.
-2.  Motivation for the changes.
-3.  An explanation of why the change is in scope for pandas.
-4.  Detailed design: Preferably with example-usage (even if not
-    implemented yet) and API documentation
-5.  API Change: Any API changes that may result from the proposal.
-
-That proposal may then be submitted as a GitHub issue, where the pandas
-maintainers can review and comment on the design. The [pandas mailing
-list](https://mail.python.org/mailman/listinfo/pandas-dev) should be
-notified of the proposal.
-
-When there's agreement that an implementation would be welcome, the
-roadmap should be updated to include the summary and a link to the
-discussion issue.
diff --git a/web/pandas/config.yml b/web/pandas/config.yml
@@ -11,6 +11,7 @@ main:
   - pandas_web.Preprocessors.blog_add_posts
   - pandas_web.Preprocessors.maintainers_add_info
   - pandas_web.Preprocessors.home_add_releases
+  - pandas_web.Preprocessors.roadmap_pdeps
   markdown_extensions:
   - toc
   - tables
@@ -177,3 +178,5 @@ sponsors:
   - name: "Gousto"
     url: https://www.gousto.co.uk/
     kind: partner
+roadmap:
+  pdeps_path: pdeps
diff --git a/web/pandas/pdeps/0001-purpose-and-guidelines.md b/web/pandas/pdeps/0001-purpose-and-guidelines.md
@@ -0,0 +1,128 @@
+# PDEP-1: Purpose and guidelines
+
+- Created: 3 August 2022
+- Status: Under discussion
+- Discussion: [#47444](https://github.com/pandas-dev/pandas/pull/47444)
+- Author: [Marc Garcia](https://github.com/datapythonista)
+- Revision: 1
+
+## PDEP definition, purpose and scope
+
+A PDEP (pandas enhancement proposal) is a proposal for a **major** change in
+pandas, in a similar way as a Python [PEP](https://peps.python.org/pep-0001/)
+or a NumPy [NEP](https://numpy.org/neps/nep-0000.html).
+
+Bug fixes and conceptually minor changes (e.g. adding a parameter to a function)
+are out of the scope of PDEPs. A PDEP should be used for changes that are not
+immediate and not obvious, and are expected to require a significant amount of
+discussion and require detailed documentation before being implemented.
+
+PDEP are appropriate for user facing changes, internal changes and organizational
+discussions. Examples of topics worth a PDEP could include moving a module from
+pandas to a separate repository, a refactoring of the pandas block manager or
+a proposal of a new code of conduct.
+
+## PDEP guidelines
+
+### Target audience
+
+A PDEP is a public document available to anyone, but the main stakeholders to
+consider when writing a PDEP are:
+
+- The core development team, who will have the final decision on whether a PDEP
+  is approved or not
+- Contributors to pandas and other related projects, and experienced users. Their
+  feedback is highly encouraged and appreciated, to make sure all points of views
+  are taken into consideration
+- The wider pandas community, in particular users, who may or may not have feedback
+  on the proposal, but should know and be able to understand the future direction of
+  the project
+
+### PDEP authors
+
+Anyone can propose a PDEP, but in most cases developers of pandas itself and related
+projects are expected to author PDEPs. If you are unsure if you should be opening
+an issue or creating a PDEP, it's probably safe to start by
+[opening an issue](https://github.com/pandas-dev/pandas/issues/new/choose), which can
+be eventually moved to a PDEP.
+
+### Workflow
+
+The possible states of a PDEP are:
+
+- Under discussion
+- Accepted
+- Implemented
+- Rejected
+
+Next is described the workflow that PDEPs can follow.
+
+#### Submitting a PDEP
+
+Proposing a PDEP is done by creating a PR adding a new file to `web/pdeps/`.
+The file is a markdown file, you can use `web/pdeps/0001.md` as a reference
+for the expected format.
+
+The initial status of a PDEP will be `Status: Under discussion`. This will be changed
+to `Status: Accepted` when the PDEP is ready and have the approval of the core team.
+
+#### Accepted PDEP
+
+A PDEP can only be accepted by the core development team, if the proposal is considered
+worth implementing. Decisions will be made based on the process detailed in the
+[pandas governance document](https://github.com/pandas-dev/pandas-governance/blob/master/governance.md).
+In general, more than one approval will be needed before the PR is merged. And
+there should not be any `Request changes` review at the time of merging.
+
+Once a PDEP is accepted, any contributions can be made toward the implementation of the PDEP,
+with an open-ended completion timeline. Development of pandas is difficult to understand and
+forecast, being that the contributors to pandas are a mix of volunteers and developers paid from different sources,
+with different priorities. For companies, institutions or individuals with interest in seeing a
+PDEP being implemented, or to in general see progress to the pandas roadmap, please check how
+you can help in the [contributing page](/contribute.html).
+
+#### Implemented PDEP
+
+Once a PDEP is implemented and available in the main branch of pandas, its
+status will be changed to `Status: Implemented`, so there is visibility that the PDEP
+is not part of the roadmap and future plans, but a change that has already
+happened. The first pandas version in which the PDEP implementation is
+available will also be included in the PDEP header with for example
+`Implemented: v2.0.0`.
+
+#### Rejected PDEP
+
+A PDEP can be rejected when the final decision is that its implementation is
+not in the best interests of the project. Rejected PDEPs are as useful as accepted
+PDEPs, since there are discussions that are worth having, and decisions about
+changes to pandas being made. They will be merged with `Status: Rejected`, so
+there is visibility on what was discussed and what was the outcome of the
+discussion. A PDEP can be rejected for different reasons, for example good ideas
+that aren't backward-compatible, and the breaking changes aren't considered worth
+implementing.
+
+#### Invalid PDEP
+
+For submitted PDEPs that do not contain proper documentation, are out of scope, or
+are not useful to the community for any other reason, the PR will be closed after
+discussion with the author, instead of merging them as rejected. This is to avoid
+adding noise to the list of rejected PDEPs, which should contain documentation as
+good as an accepted PDEP, but where the final decision was to not implement the changes.
+
+## Evolution of PDEPs
+
+Most PDEPs aren't expected to change after accepted. Once there is agreement in the changes,
+and they are implemented, the PDEP will be only useful to understand why the development happened,
+and the details of the discussion.
+
+But in some cases, a PDEP can be updated. For example, a PDEP defining a procedure or
+a policy, like this one (PDEP-1). Or cases when after attempting the implementation,
+new knowledge is obtained that makes the original PDEP obsolete, and changes are
+required. When there are specific changes to be made to the original PDEP, this will
+be edited, its `Revision: X` label will be increased by one, and a note will be added
+to the `PDEP-N history` section. This will let readers understand that the PDEP has
+changed and avoid confusion.
+
+### PDEP-1 History
+
+- 3 August 2022: Initial version
diff --git a/web/pandas/static/css/pandas.css b/web/pandas/static/css/pandas.css
@@ -8,15 +8,19 @@ h1 {
     color: #130654;
 }
 h2 {
-    font-size: 1.45rem;
+    font-size: 1.8rem;
     font-weight: 700;
-    color: black;
+    color: #130654;
 }
 h3 {
     font-size: 1.3rem;
     font-weight: 600;
     color: black;
 }
+h3 a {
+    color: black;
+    text-decoration: underline dotted !important;
+}
 a {
     color: #130654;
 }
diff --git a/web/pandas_web.py b/web/pandas_web.py
@@ -24,10 +24,12 @@
 The rest of the items in the file will be added directly to the context.
 """
 import argparse
+import collections
 import datetime
 import importlib
 import operator
 import os
+import pathlib
 import re
 import shutil
 import sys
@@ -185,6 +187,61 @@ def home_add_releases(context):
             )
         return context
 
+    @staticmethod
+    def roadmap_pdeps(context):
+        """
+        PDEP's (pandas enhancement proposals) are not part of the bar
+        navigation. They are included as lists in the "Roadmap" page
+        and linked from there. This preprocessor obtains the list of
+        PDEP's in different status from the directory tree and GitHub.
+        """
+        KNOWN_STATUS = {"Under discussion", "Accepted", "Implemented", "Rejected"}
+        context["pdeps"] = collections.defaultdict(list)
+
+        # accepted, rejected and implemented
+        pdeps_path = (
+            pathlib.Path(context["source_path"]) / context["roadmap"]["pdeps_path"]
+        )
+        for pdep in sorted(pdeps_path.iterdir()):
+            if pdep.suffix != ".md":
+                continue
+            with pdep.open() as f:
+                title = f.readline()[2:]  # removing markdown title "# "
+                status = None
+                for line in f:
+                    if line.startswith("- Status: "):
+                        status = line.strip().split(": ", 1)[1]
+                        break
+                if status not in KNOWN_STATUS:
+                    raise RuntimeError(
+                        f'PDEP "{pdep}" status "{status}" is unknown. '
+                        f"Should be one of: {KNOWN_STATUS}"
+                    )
+            html_file = pdep.with_suffix(".html").name
+            context["pdeps"][status].append(
+                {
+                    "title": title,
+                    "url": f"/pdeps/{html_file}",
+                }
+            )
+
+        # under discussion
+        github_repo_url = context["main"]["github_repo_url"]
+        resp = requests.get(
+            "https://api.github.com/search/issues?"
+            f"q=is:pr is:open label:PDEP repo:{github_repo_url}"
+        )
+        if context["ignore_io_errors"] and resp.status_code == 403:
+            return context
+        resp.raise_for_status()
+
+        for pdep in resp.json()["items"]:
+            context["pdeps"]["under_discussion"].append(
+                {"title": pdep["title"], "url": pdep["url"]}
+            )
+
+        return context
+
 
 def get_callable(obj_as_str: str) -> object:
     """

Original file line number	Diff line number	Diff line change
`@@ -8,15 +8,19 @@ h1 {`
`8`	`8`	`color: #130654;`
`9`	`9`	`}`
`10`	`10`	`h2 {`
`11`		`- font-size: 1.45rem;`
	`11`	`+ font-size: 1.8rem;`
`12`	`12`	`font-weight: 700;`
`13`		`- color: black;`
	`13`	`+ color: #130654;`
`14`	`14`	`}`
`15`	`15`	`h3 {`
`16`	`16`	`font-size: 1.3rem;`
`17`	`17`	`font-weight: 600;`
`18`	`18`	`color: black;`
`19`	`19`	`}`
	`20`	`+h3 a {`
	`21`	`+ color: black;`
	`22`	`+ text-decoration: underline dotted !important;`
	`23`	`+}`
`20`	`24`	`a {`
`21`	`25`	`color: #130654;`
`22`	`26`	`}`