Skip to content

Commit 079adc7

Browse files
authored
Merge branch 'pandas-dev:main' into bug-cov-nat
2 parents de954f7 + f1b00b8 commit 079adc7

File tree

147 files changed

+2607
-768
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

147 files changed

+2607
-768
lines changed

.github/CODEOWNERS

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,6 @@
44
# ci
55
ci/ @mroeschke
66

7-
# web
8-
web/ @datapythonista
9-
107
# docs
118
doc/cheatsheet @Dr-Irv
129
doc/source/development @noatamir

.github/actions/run-tests/action.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ runs:
1414
if: failure()
1515

1616
- name: Upload coverage to Codecov
17-
uses: codecov/codecov-action@v4
17+
uses: codecov/codecov-action@v5
1818
with:
1919
flags: unittests
2020
name: codecov-pandas

.github/actions/setup-conda/action.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ runs:
77
using: composite
88
steps:
99
- name: Install ${{ inputs.environment-file }}
10-
uses: mamba-org/setup-micromamba@v1
10+
uses: mamba-org/setup-micromamba@v2
1111
with:
1212
environment-file: ${{ inputs.environment-file }}
1313
environment-name: test

.github/workflows/docbuild-and-upload.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,10 @@ jobs:
5959
- name: Build documentation
6060
run: doc/make.py --warnings-are-errors
6161

62+
- name: Build the interactive terminal
63+
working-directory: web/interactive_terminal
64+
run: jupyter lite build
65+
6266
- name: Build documentation zip
6367
run: doc/make.py zip_html
6468

.github/workflows/wheels.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@ jobs:
153153
run: echo "sdist_name=$(cd ./dist && ls -d */)" >> "$GITHUB_ENV"
154154

155155
- name: Build wheels
156-
uses: pypa/cibuildwheel@v2.22.0
156+
uses: pypa/cibuildwheel@v2.23.0
157157
with:
158158
package-dir: ./dist/${{ startsWith(matrix.buildplat[1], 'macosx') && env.sdist_name || needs.build_sdist.outputs.sdist_file }}
159159
env:

.pre-commit-config.yaml

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ ci:
1919
skip: [pyright, mypy]
2020
repos:
2121
- repo: https://github.com/astral-sh/ruff-pre-commit
22-
rev: v0.9.4
22+
rev: v0.9.9
2323
hooks:
2424
- id: ruff
2525
args: [--exit-non-zero-on-fix]
@@ -70,7 +70,7 @@ repos:
7070
- id: trailing-whitespace
7171
args: [--markdown-linebreak-ext=md]
7272
- repo: https://github.com/PyCQA/isort
73-
rev: 6.0.0
73+
rev: 6.0.1
7474
hooks:
7575
- id: isort
7676
- repo: https://github.com/asottile/pyupgrade
@@ -106,6 +106,11 @@ repos:
106106
hooks:
107107
- id: meson-fmt
108108
args: ['--inplace']
109+
- repo: https://github.com/shellcheck-py/shellcheck-py
110+
rev: v0.10.0.1
111+
hooks:
112+
- id: shellcheck
113+
args: ["--severity=warning"]
109114
- repo: local
110115
hooks:
111116
- id: pyright

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,5 +13,5 @@ COPY requirements-dev.txt /tmp
1313
RUN python -m pip install -r /tmp/requirements-dev.txt
1414
RUN git config --global --add safe.directory /home/pandas
1515

16-
ENV SHELL "/bin/bash"
16+
ENV SHELL="/bin/bash"
1717
CMD ["/bin/bash"]

ci/code_checks.sh

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -24,15 +24,15 @@ else
2424
fi
2525

2626
[[ -z "$CHECK" || "$CHECK" == "code" || "$CHECK" == "doctests" || "$CHECK" == "docstrings" || "$CHECK" == "single-docs" || "$CHECK" == "notebooks" ]] || \
27-
{ echo "Unknown command $1. Usage: $0 [code|doctests|docstrings|single-docs|notebooks]"; exit 9999; }
27+
{ echo "Unknown command $1. Usage: $0 [code|doctests|docstrings|single-docs|notebooks]"; exit 1; }
2828

29-
BASE_DIR="$(dirname $0)/.."
29+
BASE_DIR="$(dirname "$0")/.."
3030
RET=0
3131

3232
### CODE ###
3333
if [[ -z "$CHECK" || "$CHECK" == "code" ]]; then
3434

35-
MSG='Check import. No warnings, and blocklist some optional dependencies' ; echo $MSG
35+
MSG='Check import. No warnings, and blocklist some optional dependencies' ; echo "$MSG"
3636
python -W error -c "
3737
import sys
3838
import pandas
@@ -49,24 +49,24 @@ if mods:
4949
sys.stderr.write('err: pandas should not import: {}\n'.format(', '.join(mods)))
5050
sys.exit(len(mods))
5151
"
52-
RET=$(($RET + $?)) ; echo $MSG "DONE"
52+
RET=$(($RET + $?)) ; echo "$MSG" "DONE"
5353

5454
fi
5555

5656
### DOCTESTS ###
5757
if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then
5858

59-
MSG='Python and Cython Doctests' ; echo $MSG
59+
MSG='Python and Cython Doctests' ; echo "$MSG"
6060
python -c 'import pandas as pd; pd.test(run_doctests=True)'
61-
RET=$(($RET + $?)) ; echo $MSG "DONE"
61+
RET=$(($RET + $?)) ; echo "$MSG" "DONE"
6262

6363
fi
6464

6565
### DOCSTRINGS ###
6666
if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
6767

68-
MSG='Validate Docstrings' ; echo $MSG
69-
$BASE_DIR/scripts/validate_docstrings.py \
68+
MSG='Validate Docstrings' ; echo "$MSG"
69+
"$BASE_DIR"/scripts/validate_docstrings.py \
7070
--format=actions \
7171
-i ES01 `# For now it is ok if docstrings are missing the extended summary` \
7272
-i "pandas.Series.dt PR01" `# Accessors are implemented as classes, but we do not document the Parameters section` \
@@ -265,7 +265,7 @@ fi
265265
if [[ -z "$CHECK" || "$CHECK" == "notebooks" ]]; then
266266

267267
MSG='Notebooks' ; echo $MSG
268-
jupyter nbconvert --execute $(find doc/source -name '*.ipynb') --to notebook
268+
jupyter nbconvert --execute "$(find doc/source -name '*.ipynb')" --to notebook
269269
RET=$(($RET + $?)) ; echo $MSG "DONE"
270270

271271
fi

ci/run_tests.sh

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,8 @@
33
# Workaround for pytest-xdist (it collects different tests in the workers if PYTHONHASHSEED is not set)
44
# https://github.com/pytest-dev/pytest/issues/920
55
# https://github.com/pytest-dev/pytest/issues/1075
6-
export PYTHONHASHSEED=$(python -c 'import random; print(random.randint(1, 4294967295))')
7-
8-
# May help reproduce flaky CI builds if set in subsequent runs
9-
echo PYTHONHASHSEED=$PYTHONHASHSEED
6+
PYTHONHASHSEED=$(python -c 'import random; print(random.randint(1, 4294967295))')
7+
export PYTHONHASHSEED
108

119
COVERAGE="-s --cov=pandas --cov-report=xml --cov-append --cov-config=pyproject.toml"
1210

@@ -16,5 +14,5 @@ if [[ "$PATTERN" ]]; then
1614
PYTEST_CMD="$PYTEST_CMD -m \"$PATTERN\""
1715
fi
1816

19-
echo $PYTEST_CMD
17+
echo "$PYTEST_CMD"
2018
sh -c "$PYTEST_CMD"

ci/upload_wheels.sh

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
#!/bin/bash
12
# Modified from numpy's https://github.com/numpy/numpy/blob/main/tools/wheels/upload_wheels.sh
23

34
set_upload_vars() {
@@ -19,20 +20,20 @@ set_upload_vars() {
1920
fi
2021
}
2122
upload_wheels() {
22-
echo ${PWD}
23+
echo "${PWD}"
2324
if [[ ${ANACONDA_UPLOAD} == true ]]; then
24-
if [ -z ${TOKEN} ]; then
25+
if [ -z "${TOKEN}" ]; then
2526
echo no token set, not uploading
2627
else
2728
# sdists are located under dist folder when built through setup.py
2829
if compgen -G "./dist/*.gz"; then
2930
echo "Found sdist"
30-
anaconda -q -t ${TOKEN} upload --skip -u ${ANACONDA_ORG} ./dist/*.gz
31+
anaconda -q -t "${TOKEN}" upload --skip -u "${ANACONDA_ORG}" ./dist/*.gz
3132
echo "Uploaded sdist"
3233
fi
3334
if compgen -G "./wheelhouse/*.whl"; then
3435
echo "Found wheel"
35-
anaconda -q -t ${TOKEN} upload --skip -u ${ANACONDA_ORG} ./wheelhouse/*.whl
36+
anaconda -q -t "${TOKEN}" upload --skip -u "${ANACONDA_ORG}" ./wheelhouse/*.whl
3637
echo "Uploaded wheel"
3738
fi
3839
echo "PyPI-style index: https://pypi.anaconda.org/$ANACONDA_ORG/simple"

doc/source/development/contributing_codebase.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,7 @@ In some cases you may be tempted to use ``cast`` from the typing module when you
198198
obj = cast(str, obj) # Mypy complains without this!
199199
return obj.upper()
200200
201-
The limitation here is that while a human can reasonably understand that ``is_number`` would catch the ``int`` and ``float`` types mypy cannot make that same inference just yet (see `mypy #5206 <https://github.com/python/mypy/issues/5206>`_. While the above works, the use of ``cast`` is **strongly discouraged**. Where applicable a refactor of the code to appease static analysis is preferable
201+
The limitation here is that while a human can reasonably understand that ``is_number`` would catch the ``int`` and ``float`` types mypy cannot make that same inference just yet (see `mypy #5206 <https://github.com/python/mypy/issues/5206>`_). While the above works, the use of ``cast`` is **strongly discouraged**. Where applicable a refactor of the code to appease static analysis is preferable
202202

203203
.. code-block:: python
204204
@@ -344,7 +344,7 @@ be located.
344344
- tests.scalar
345345
- tests.tseries.offsets
346346

347-
2. Does your test depend only on code in pd._libs?
347+
2. Does your test depend only on code in ``pd._libs``?
348348
This test likely belongs in one of:
349349

350350
- tests.libs

doc/source/development/contributing_gitpod.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ development experience:
109109

110110
* `VSCode rst extension <https://marketplace.visualstudio.com/items?itemName=lextudio.restructuredtext>`_
111111
* `Markdown All in One <https://marketplace.visualstudio.com/items?itemName=yzhang.markdown-all-in-one>`_
112-
* `VSCode Gitlens extension <https://marketplace.visualstudio.com/items?itemName=eamodio.gitlens>`_
112+
* `VSCode GitLens extension <https://marketplace.visualstudio.com/items?itemName=eamodio.gitlens>`_
113113
* `VSCode Git Graph extension <https://marketplace.visualstudio.com/items?itemName=mhutchie.git-graph>`_
114114

115115
Development workflow with Gitpod

doc/source/development/developer.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ Column metadata
9999
* Boolean: ``'bool'``
100100
* Integers: ``'int8', 'int16', 'int32', 'int64', 'uint8', 'uint16', 'uint32', 'uint64'``
101101
* Floats: ``'float16', 'float32', 'float64'``
102-
* Date and Time Types: ``'datetime', 'datetimetz'``, ``'timedelta'``
102+
* Date and Time Types: ``'datetime', 'datetimetz', 'timedelta'``
103103
* String: ``'unicode', 'bytes'``
104104
* Categorical: ``'categorical'``
105105
* Other Python objects: ``'object'``

doc/source/getting_started/intro_tutorials/03_subset_data.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -335,7 +335,7 @@ the name ``anonymous`` to the first 3 elements of the fourth column:
335335
.. ipython:: python
336336
337337
titanic.iloc[0:3, 3] = "anonymous"
338-
titanic.head()
338+
titanic.iloc[:5, 3]
339339
340340
.. raw:: html
341341

doc/source/getting_started/overview.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,3 +174,4 @@ License
174174
-------
175175

176176
.. literalinclude:: ../../../LICENSE
177+
:language: none

doc/source/reference/arrays.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ is an :class:`ArrowDtype`.
6161
support as NumPy including first-class nullability support for all data types, immutability and more.
6262

6363
The table below shows the equivalent pyarrow-backed (``pa``), pandas extension, and numpy (``np``) types that are recognized by pandas.
64-
Pyarrow-backed types below need to be passed into :class:`ArrowDtype` to be recognized by pandas e.g. ``pd.ArrowDtype(pa.bool_())``
64+
Pyarrow-backed types below need to be passed into :class:`ArrowDtype` to be recognized by pandas e.g. ``pd.ArrowDtype(pa.bool_())``.
6565

6666
=============================================== ========================== ===================
6767
PyArrow type pandas extension type NumPy type
@@ -114,7 +114,7 @@ values.
114114

115115
ArrowDtype
116116

117-
For more information, please see the :ref:`PyArrow user guide <pyarrow>`
117+
For more information, please see the :ref:`PyArrow user guide <pyarrow>`.
118118

119119
.. _api.arrays.datetime:
120120

@@ -495,7 +495,7 @@ a :class:`CategoricalDtype`.
495495
CategoricalDtype.categories
496496
CategoricalDtype.ordered
497497

498-
Categorical data can be stored in a :class:`pandas.Categorical`
498+
Categorical data can be stored in a :class:`pandas.Categorical`:
499499

500500
.. autosummary::
501501
:toctree: api/

doc/source/reference/series.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ Attributes
2525
Series.array
2626
Series.values
2727
Series.dtype
28+
Series.info
2829
Series.shape
2930
Series.nbytes
3031
Series.ndim

doc/source/user_guide/io.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,10 @@ The pandas I/O API is a set of top level ``reader`` functions accessed like
1818
:widths: 30, 100, 60, 60
1919

2020
text,`CSV <https://en.wikipedia.org/wiki/Comma-separated_values>`__, :ref:`read_csv<io.read_csv_table>`, :ref:`to_csv<io.store_in_csv>`
21-
text,Fixed-Width Text File, :ref:`read_fwf<io.fwf_reader>` , NA
21+
text,Fixed-Width Text File, :ref:`read_fwf<io.fwf_reader>`, NA
2222
text,`JSON <https://www.json.org/>`__, :ref:`read_json<io.json_reader>`, :ref:`to_json<io.json_writer>`
2323
text,`HTML <https://en.wikipedia.org/wiki/HTML>`__, :ref:`read_html<io.read_html>`, :ref:`to_html<io.html>`
24-
text,`LaTeX <https://en.wikipedia.org/wiki/LaTeX>`__, :ref:`Styler.to_latex<io.latex>` , NA
24+
text,`LaTeX <https://en.wikipedia.org/wiki/LaTeX>`__, NA, :ref:`Styler.to_latex<io.latex>`
2525
text,`XML <https://www.w3.org/standards/xml/core>`__, :ref:`read_xml<io.read_xml>`, :ref:`to_xml<io.xml>`
2626
text, Local clipboard, :ref:`read_clipboard<io.clipboard>`, :ref:`to_clipboard<io.clipboard>`
2727
binary,`MS Excel <https://en.wikipedia.org/wiki/Microsoft_Excel>`__ , :ref:`read_excel<io.excel_reader>`, :ref:`to_excel<io.excel_writer>`

doc/source/user_guide/merging.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -906,7 +906,7 @@ resetting indexes.
906906
Joining multiple :class:`DataFrame`
907907
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
908908

909-
A list or tuple of ``:class:`DataFrame``` can also be passed to :meth:`~DataFrame.join`
909+
A list or tuple of :class:`DataFrame` can also be passed to :meth:`~DataFrame.join`
910910
to join them together on their indexes.
911911

912912
.. ipython:: python

doc/source/user_guide/text.rst

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Text data types
1313

1414
There are two ways to store text data in pandas:
1515

16-
1. ``object`` -dtype NumPy array.
16+
1. ``object`` dtype NumPy array.
1717
2. :class:`StringDtype` extension type.
1818

1919
We recommend using :class:`StringDtype` to store text data.
@@ -40,20 +40,20 @@ to significantly increase the performance and lower the memory overhead of
4040
and parts of the API may change without warning.
4141

4242
For backwards-compatibility, ``object`` dtype remains the default type we
43-
infer a list of strings to
43+
infer a list of strings to:
4444

4545
.. ipython:: python
4646
4747
pd.Series(["a", "b", "c"])
4848
49-
To explicitly request ``string`` dtype, specify the ``dtype``
49+
To explicitly request ``string`` dtype, specify the ``dtype``:
5050

5151
.. ipython:: python
5252
5353
pd.Series(["a", "b", "c"], dtype="string")
5454
pd.Series(["a", "b", "c"], dtype=pd.StringDtype())
5555
56-
Or ``astype`` after the ``Series`` or ``DataFrame`` is created
56+
Or ``astype`` after the ``Series`` or ``DataFrame`` is created:
5757

5858
.. ipython:: python
5959
@@ -88,9 +88,9 @@ Behavior differences
8888
^^^^^^^^^^^^^^^^^^^^
8989

9090
These are places where the behavior of ``StringDtype`` objects differ from
91-
``object`` dtype
91+
``object`` dtype:
9292

93-
l. For ``StringDtype``, :ref:`string accessor methods<api.series.str>`
93+
1. For ``StringDtype``, :ref:`string accessor methods<api.series.str>`
9494
that return **numeric** output will always return a nullable integer dtype,
9595
rather than either int or float dtype, depending on the presence of NA values.
9696
Methods returning **boolean** output will return a nullable boolean dtype.
@@ -102,7 +102,7 @@ l. For ``StringDtype``, :ref:`string accessor methods<api.series.str>`
102102
s.str.count("a")
103103
s.dropna().str.count("a")
104104
105-
Both outputs are ``Int64`` dtype. Compare that with object-dtype
105+
Both outputs are ``Int64`` dtype. Compare that with object-dtype:
106106

107107
.. ipython:: python
108108
@@ -332,8 +332,8 @@ regular expression object will raise a ``ValueError``.
332332
---------------------------------------------------------------------------
333333
ValueError: case and flags cannot be set when pat is a compiled regex
334334

335-
``removeprefix`` and ``removesuffix`` have the same effect as ``str.removeprefix`` and ``str.removesuffix`` added in Python 3.9
336-
<https://docs.python.org/3/library/stdtypes.html#str.removeprefix>`__:
335+
``removeprefix`` and ``removesuffix`` have the same effect as ``str.removeprefix`` and ``str.removesuffix`` added in
336+
`Python 3.9 <https://docs.python.org/3/library/stdtypes.html#str.removeprefix>`__:
337337

338338
.. versionadded:: 1.4.0
339339

doc/source/user_guide/window.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,8 @@ which will first group the data by the specified keys and then perform a windowi
7070

7171
Some windowing aggregation, ``mean``, ``sum``, ``var`` and ``std`` methods may suffer from numerical
7272
imprecision due to the underlying windowing algorithms accumulating sums. When values differ
73-
with magnitude :math:`1/np.finfo(np.double).eps` this results in truncation. It must be
73+
with magnitude ``1/np.finfo(np.double).eps`` (approximately :math:`4.5 \times 10^{15}`),
74+
this results in truncation. It must be
7475
noted, that large values may have an impact on windows, which do not include these values. `Kahan summation
7576
<https://en.wikipedia.org/wiki/Kahan_summation_algorithm>`__ is used
7677
to compute the rolling sums to preserve accuracy as much as possible.
@@ -356,11 +357,11 @@ See :ref:`enhancing performance with Numba <enhancingperf.numba>` for general us
356357

357358
Numba will be applied in potentially two routines:
358359

359-
#. If ``func`` is a standard Python function, the engine will `JIT <https://numba.pydata.org/numba-doc/latest/user/overview.html>`__ the passed function. ``func`` can also be a JITed function in which case the engine will not JIT the function again.
360+
#. If ``func`` is a standard Python function, the engine will `JIT <https://numba.readthedocs.io/en/stable/user/overview.html>`__ the passed function. ``func`` can also be a JITed function in which case the engine will not JIT the function again.
360361
#. The engine will JIT the for loop where the apply function is applied to each window.
361362

362363
The ``engine_kwargs`` argument is a dictionary of keyword arguments that will be passed into the
363-
`numba.jit decorator <https://numba.pydata.org/numba-doc/latest/reference/jit-compilation.html#numba.jit>`__.
364+
`numba.jit decorator <https://numba.readthedocs.io/en/stable/user/jit.html>`__.
364365
These keyword arguments will be applied to *both* the passed function (if a standard Python function)
365366
and the apply for loop over each window.
366367

0 commit comments

Comments
 (0)