diff --git a/web/pandas/about/roadmap.md b/web/pandas/about/roadmap.md index 3c6c4d4fdf9a2..6e922d01518ba 100644 --- a/web/pandas/about/roadmap.md +++ b/web/pandas/about/roadmap.md @@ -15,10 +15,35 @@ fundamental changes to the project that are likely to take months or years of developer time. Smaller-scoped items will continue to be tracked on our [issue tracker](https://github.com/pandas-dev/pandas/issues). -See [Roadmap evolution](#roadmap-evolution) for proposing -changes to this document. +The roadmap is defined as a set of major enhancement proposals named PDEPs. +For more information about PDEPs, and how to submit one, please refer to +[PEDP-1](/pdeps/accepted/0001-puropose-and-guidelines.html). -## Extensibility +## PDEPs + +{% for pdep_type in ["Under discussion", "Accepted", "Implemented", "Rejected"] %} + +

{{ pdep_type.replace("_", " ").capitalize() }}

+ + + +{% endfor %} + +## Roadmap points pending a PDEP + + + +### Extensibility Pandas `extending.extension-types` allow for extending NumPy types with custom data types and array storage. @@ -33,7 +58,7 @@ library, making their behavior more consistent with the handling of NumPy arrays. We'll do this by cleaning up pandas' internals and adding new methods to the extension array interface. -## String data type +### String data type Currently, pandas stores text data in an `object` -dtype NumPy array. The current implementation has two primary drawbacks: First, `object` @@ -54,7 +79,7 @@ work, we may need to implement certain operations expected by pandas users (for example the algorithm used in, `Series.str.upper`). That work may be done outside of pandas. -## Apache Arrow interoperability +### Apache Arrow interoperability [Apache Arrow](https://arrow.apache.org) is a cross-language development platform for in-memory data. The Arrow logical types are closely aligned @@ -65,7 +90,7 @@ data types within pandas. This will let us take advantage of its I/O capabilities and provide for better interoperability with other languages and libraries using Arrow. -## Block manager rewrite +### Block manager rewrite We'd like to replace pandas current internal data structures (a collection of 1 or 2-D arrays) with a simpler collection of 1-D arrays. @@ -92,7 +117,7 @@ See [these design documents](https://dev.pandas.io/pandas2/internal-architecture.html#removal-of-blockmanager-new-dataframe-internals) for more. -## Decoupling of indexing and internals +### Decoupling of indexing and internals The code for getting and setting values in pandas' data structures needs refactoring. In particular, we must clearly separate code that @@ -150,7 +175,7 @@ which are actually expected (typically `KeyError`). and when small differences in behavior are expected (e.g. getting with `.loc` raises for missing labels, setting still doesn't), they can be managed with a specific parameter. -## Numba-accelerated operations +### Numba-accelerated operations [Numba](https://numba.pydata.org) is a JIT compiler for Python code. We'd like to provide ways for users to apply their own Numba-jitted @@ -162,7 +187,7 @@ window contexts). This will improve the performance of user-defined-functions in these operations by staying within compiled code. -## Documentation improvements +### Documentation improvements We'd like to improve the content, structure, and presentation of the pandas documentation. Some specific goals include @@ -177,7 +202,7 @@ pandas documentation. Some specific goals include subsections of the documentation to make navigation and finding content easier. -## Performance monitoring +### Performance monitoring Pandas uses [airspeed velocity](https://asv.readthedocs.io/en/stable/) to monitor for performance regressions. ASV itself is a fabulous tool, @@ -197,29 +222,3 @@ We'd like to fund improvements and maintenance of these tools to - Build a GitHub bot to request ASV runs *before* a PR is merged. Currently, the benchmarks are only run nightly. - -## Roadmap Evolution - -Pandas continues to evolve. The direction is primarily determined by -community interest. Everyone is welcome to review existing items on the -roadmap and to propose a new item. - -Each item on the roadmap should be a short summary of a larger design -proposal. The proposal should include - -1. Short summary of the changes, which would be appropriate for - inclusion in the roadmap if accepted. -2. Motivation for the changes. -3. An explanation of why the change is in scope for pandas. -4. Detailed design: Preferably with example-usage (even if not - implemented yet) and API documentation -5. API Change: Any API changes that may result from the proposal. - -That proposal may then be submitted as a GitHub issue, where the pandas -maintainers can review and comment on the design. The [pandas mailing -list](https://mail.python.org/mailman/listinfo/pandas-dev) should be -notified of the proposal. - -When there's agreement that an implementation would be welcome, the -roadmap should be updated to include the summary and a link to the -discussion issue. diff --git a/web/pandas/config.yml b/web/pandas/config.yml index 1330addf9a229..aa4deaea98a6c 100644 --- a/web/pandas/config.yml +++ b/web/pandas/config.yml @@ -11,6 +11,7 @@ main: - pandas_web.Preprocessors.blog_add_posts - pandas_web.Preprocessors.maintainers_add_info - pandas_web.Preprocessors.home_add_releases + - pandas_web.Preprocessors.roadmap_pdeps markdown_extensions: - toc - tables @@ -177,3 +178,5 @@ sponsors: - name: "Gousto" url: https://www.gousto.co.uk/ kind: partner +roadmap: + pdeps_path: pdeps diff --git a/web/pandas/pdeps/0001-purpose-and-guidelines.md b/web/pandas/pdeps/0001-purpose-and-guidelines.md new file mode 100644 index 0000000000000..085f675974b2e --- /dev/null +++ b/web/pandas/pdeps/0001-purpose-and-guidelines.md @@ -0,0 +1,128 @@ +# PDEP-1: Purpose and guidelines + +- Created: 3 August 2022 +- Status: Under discussion +- Discussion: [#47444](https://github.com/pandas-dev/pandas/pull/47444) +- Author: [Marc Garcia](https://github.com/datapythonista) +- Revision: 1 + +## PDEP definition, purpose and scope + +A PDEP (pandas enhancement proposal) is a proposal for a **major** change in +pandas, in a similar way as a Python [PEP](https://peps.python.org/pep-0001/) +or a NumPy [NEP](https://numpy.org/neps/nep-0000.html). + +Bug fixes and conceptually minor changes (e.g. adding a parameter to a function) +are out of the scope of PDEPs. A PDEP should be used for changes that are not +immediate and not obvious, and are expected to require a significant amount of +discussion and require detailed documentation before being implemented. + +PDEP are appropriate for user facing changes, internal changes and organizational +discussions. Examples of topics worth a PDEP could include moving a module from +pandas to a separate repository, a refactoring of the pandas block manager or +a proposal of a new code of conduct. + +## PDEP guidelines + +### Target audience + +A PDEP is a public document available to anyone, but the main stakeholders to +consider when writing a PDEP are: + +- The core development team, who will have the final decision on whether a PDEP + is approved or not +- Contributors to pandas and other related projects, and experienced users. Their + feedback is highly encouraged and appreciated, to make sure all points of views + are taken into consideration +- The wider pandas community, in particular users, who may or may not have feedback + on the proposal, but should know and be able to understand the future direction of + the project + +### PDEP authors + +Anyone can propose a PDEP, but in most cases developers of pandas itself and related +projects are expected to author PDEPs. If you are unsure if you should be opening +an issue or creating a PDEP, it's probably safe to start by +[opening an issue](https://github.com/pandas-dev/pandas/issues/new/choose), which can +be eventually moved to a PDEP. + +### Workflow + +The possible states of a PDEP are: + +- Under discussion +- Accepted +- Implemented +- Rejected + +Next is described the workflow that PDEPs can follow. + +#### Submitting a PDEP + +Proposing a PDEP is done by creating a PR adding a new file to `web/pdeps/`. +The file is a markdown file, you can use `web/pdeps/0001.md` as a reference +for the expected format. + +The initial status of a PDEP will be `Status: Under discussion`. This will be changed +to `Status: Accepted` when the PDEP is ready and have the approval of the core team. + +#### Accepted PDEP + +A PDEP can only be accepted by the core development team, if the proposal is considered +worth implementing. Decisions will be made based on the process detailed in the +[pandas governance document](https://github.com/pandas-dev/pandas-governance/blob/master/governance.md). +In general, more than one approval will be needed before the PR is merged. And +there should not be any `Request changes` review at the time of merging. + +Once a PDEP is accepted, any contributions can be made toward the implementation of the PDEP, +with an open-ended completion timeline. Development of pandas is difficult to understand and +forecast, being that the contributors to pandas are a mix of volunteers and developers paid from different sources, +with different priorities. For companies, institutions or individuals with interest in seeing a +PDEP being implemented, or to in general see progress to the pandas roadmap, please check how +you can help in the [contributing page](/contribute.html). + +#### Implemented PDEP + +Once a PDEP is implemented and available in the main branch of pandas, its +status will be changed to `Status: Implemented`, so there is visibility that the PDEP +is not part of the roadmap and future plans, but a change that has already +happened. The first pandas version in which the PDEP implementation is +available will also be included in the PDEP header with for example +`Implemented: v2.0.0`. + +#### Rejected PDEP + +A PDEP can be rejected when the final decision is that its implementation is +not in the best interests of the project. Rejected PDEPs are as useful as accepted +PDEPs, since there are discussions that are worth having, and decisions about +changes to pandas being made. They will be merged with `Status: Rejected`, so +there is visibility on what was discussed and what was the outcome of the +discussion. A PDEP can be rejected for different reasons, for example good ideas +that aren't backward-compatible, and the breaking changes aren't considered worth +implementing. + +#### Invalid PDEP + +For submitted PDEPs that do not contain proper documentation, are out of scope, or +are not useful to the community for any other reason, the PR will be closed after +discussion with the author, instead of merging them as rejected. This is to avoid +adding noise to the list of rejected PDEPs, which should contain documentation as +good as an accepted PDEP, but where the final decision was to not implement the changes. + +## Evolution of PDEPs + +Most PDEPs aren't expected to change after accepted. Once there is agreement in the changes, +and they are implemented, the PDEP will be only useful to understand why the development happened, +and the details of the discussion. + +But in some cases, a PDEP can be updated. For example, a PDEP defining a procedure or +a policy, like this one (PDEP-1). Or cases when after attempting the implementation, +new knowledge is obtained that makes the original PDEP obsolete, and changes are +required. When there are specific changes to be made to the original PDEP, this will +be edited, its `Revision: X` label will be increased by one, and a note will be added +to the `PDEP-N history` section. This will let readers understand that the PDEP has +changed and avoid confusion. + +### PDEP-1 History + +- 3 August 2022: Initial version diff --git a/web/pandas/static/css/pandas.css b/web/pandas/static/css/pandas.css index d5112dd220355..96ea6a6f2ae52 100644 --- a/web/pandas/static/css/pandas.css +++ b/web/pandas/static/css/pandas.css @@ -8,15 +8,19 @@ h1 { color: #130654; } h2 { - font-size: 1.45rem; + font-size: 1.8rem; font-weight: 700; - color: black; + color: #130654; } h3 { font-size: 1.3rem; font-weight: 600; color: black; } +h3 a { + color: black; + text-decoration: underline dotted !important; +} a { color: #130654; } diff --git a/web/pandas_web.py b/web/pandas_web.py index 7dd63175e69ac..16e9024d8d1d8 100755 --- a/web/pandas_web.py +++ b/web/pandas_web.py @@ -24,10 +24,12 @@ The rest of the items in the file will be added directly to the context. """ import argparse +import collections import datetime import importlib import operator import os +import pathlib import re import shutil import sys @@ -185,6 +187,61 @@ def home_add_releases(context): ) return context + @staticmethod + def roadmap_pdeps(context): + """ + PDEP's (pandas enhancement proposals) are not part of the bar + navigation. They are included as lists in the "Roadmap" page + and linked from there. This preprocessor obtains the list of + PDEP's in different status from the directory tree and GitHub. + """ + KNOWN_STATUS = {"Under discussion", "Accepted", "Implemented", "Rejected"} + context["pdeps"] = collections.defaultdict(list) + + # accepted, rejected and implemented + pdeps_path = ( + pathlib.Path(context["source_path"]) / context["roadmap"]["pdeps_path"] + ) + for pdep in sorted(pdeps_path.iterdir()): + if pdep.suffix != ".md": + continue + with pdep.open() as f: + title = f.readline()[2:] # removing markdown title "# " + status = None + for line in f: + if line.startswith("- Status: "): + status = line.strip().split(": ", 1)[1] + break + if status not in KNOWN_STATUS: + raise RuntimeError( + f'PDEP "{pdep}" status "{status}" is unknown. ' + f"Should be one of: {KNOWN_STATUS}" + ) + html_file = pdep.with_suffix(".html").name + context["pdeps"][status].append( + { + "title": title, + "url": f"/pdeps/{html_file}", + } + ) + + # under discussion + github_repo_url = context["main"]["github_repo_url"] + resp = requests.get( + "https://api.github.com/search/issues?" + f"q=is:pr is:open label:PDEP repo:{github_repo_url}" + ) + if context["ignore_io_errors"] and resp.status_code == 403: + return context + resp.raise_for_status() + + for pdep in resp.json()["items"]: + context["pdeps"]["under_discussion"].append( + {"title": pdep["title"], "url": pdep["url"]} + ) + + return context + def get_callable(obj_as_str: str) -> object: """