Skip to content

Commit fd2939b

Browse files
authored
Merge branch 'main' into sas/shlookup
2 parents 20f7e16 + bedd8f0 commit fd2939b

File tree

109 files changed

+1427
-445
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

109 files changed

+1427
-445
lines changed

.pre-commit-config.yaml

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -94,8 +94,6 @@ repos:
9494
stages: [manual]
9595
additional_dependencies: &pyright_dependencies
9696
- pyright@1.1.258
97-
- repo: local
98-
hooks:
9997
- id: pyright_reportGeneralTypeIssues
10098
name: pyright reportGeneralTypeIssues
10199
entry: pyright --skipunannotated -p pyright_reportGeneralTypeIssues.json
@@ -105,8 +103,6 @@ repos:
105103
types: [python]
106104
stages: [manual]
107105
additional_dependencies: *pyright_dependencies
108-
- repo: local
109-
hooks:
110106
- id: mypy
111107
name: mypy
112108
entry: mypy
@@ -115,8 +111,6 @@ repos:
115111
pass_filenames: false
116112
types: [python]
117113
stages: [manual]
118-
- repo: local
119-
hooks:
120114
- id: flake8-rst
121115
name: flake8-rst
122116
description: Run flake8 on code snippets in docstrings or RST files
@@ -237,3 +231,15 @@ repos:
237231
additional_dependencies:
238232
- flake8==4.0.1
239233
- flake8-pyi==22.5.1
234+
- id: future-annotations
235+
name: import annotations from __future__
236+
entry: 'from __future__ import annotations'
237+
language: pygrep
238+
args: [--negate]
239+
files: ^pandas/
240+
types: [python]
241+
exclude: |
242+
(?x)
243+
/(__init__\.py)|(api\.py)|(_version\.py)|(testing\.py)|(conftest\.py)$
244+
|/tests/
245+
|/_testing/

doc/source/getting_started/install.rst

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ the code base as of this writing. To run it on your machine to verify that
199199
everything is working (and that you have all of the dependencies, soft and hard,
200200
installed), make sure you have `pytest
201201
<https://docs.pytest.org/en/latest/>`__ >= 6.0 and `Hypothesis
202-
<https://hypothesis.readthedocs.io/en/latest/>`__ >= 3.58, then run:
202+
<https://hypothesis.readthedocs.io/en/latest/>`__ >= 6.13.0, then run:
203203

204204
::
205205

@@ -247,11 +247,11 @@ Recommended dependencies
247247

248248
* `numexpr <https://github.com/pydata/numexpr>`__: for accelerating certain numerical operations.
249249
``numexpr`` uses multiple cores as well as smart chunking and caching to achieve large speedups.
250-
If installed, must be Version 2.7.1 or higher.
250+
If installed, must be Version 2.7.3 or higher.
251251

252252
* `bottleneck <https://github.com/pydata/bottleneck>`__: for accelerating certain types of ``nan``
253253
evaluations. ``bottleneck`` uses specialized cython routines to achieve large speedups. If installed,
254-
must be Version 1.3.1 or higher.
254+
must be Version 1.3.2 or higher.
255255

256256
.. note::
257257

@@ -277,8 +277,8 @@ Visualization
277277
Dependency Minimum Version Notes
278278
========================= ================== =============================================================
279279
matplotlib 3.3.2 Plotting library
280-
Jinja2 2.11 Conditional formatting with DataFrame.style
281-
tabulate 0.8.7 Printing in Markdown-friendly format (see `tabulate`_)
280+
Jinja2 3.0.0 Conditional formatting with DataFrame.style
281+
tabulate 0.8.9 Printing in Markdown-friendly format (see `tabulate`_)
282282
========================= ================== =============================================================
283283

284284
Computation
@@ -287,10 +287,10 @@ Computation
287287
========================= ================== =============================================================
288288
Dependency Minimum Version Notes
289289
========================= ================== =============================================================
290-
SciPy 1.4.1 Miscellaneous statistical functions
291-
numba 0.50.1 Alternative execution engine for rolling operations
290+
SciPy 1.7.1 Miscellaneous statistical functions
291+
numba 0.53.1 Alternative execution engine for rolling operations
292292
(see :ref:`Enhancing Performance <enhancingperf.numba>`)
293-
xarray 0.15.1 pandas-like API for N-dimensional data
293+
xarray 0.19.0 pandas-like API for N-dimensional data
294294
========================= ================== =============================================================
295295

296296
Excel files
@@ -301,9 +301,9 @@ Dependency Minimum Version Notes
301301
========================= ================== =============================================================
302302
xlrd 2.0.1 Reading Excel
303303
xlwt 1.3.0 Writing Excel
304-
xlsxwriter 1.2.2 Writing Excel
305-
openpyxl 3.0.3 Reading / writing for xlsx files
306-
pyxlsb 1.0.6 Reading for xlsb files
304+
xlsxwriter 1.4.3 Writing Excel
305+
openpyxl 3.0.7 Reading / writing for xlsx files
306+
pyxlsb 1.0.8 Reading for xlsb files
307307
========================= ================== =============================================================
308308

309309
HTML
@@ -312,9 +312,9 @@ HTML
312312
========================= ================== =============================================================
313313
Dependency Minimum Version Notes
314314
========================= ================== =============================================================
315-
BeautifulSoup4 4.8.2 HTML parser for read_html
315+
BeautifulSoup4 4.9.3 HTML parser for read_html
316316
html5lib 1.1 HTML parser for read_html
317-
lxml 4.5.0 HTML parser for read_html
317+
lxml 4.6.3 HTML parser for read_html
318318
========================= ================== =============================================================
319319

320320
One of the following combinations of libraries is needed to use the
@@ -356,9 +356,9 @@ SQL databases
356356
========================= ================== =============================================================
357357
Dependency Minimum Version Notes
358358
========================= ================== =============================================================
359-
SQLAlchemy 1.4.0 SQL support for databases other than sqlite
360-
psycopg2 2.8.4 PostgreSQL engine for sqlalchemy
361-
pymysql 0.10.1 MySQL engine for sqlalchemy
359+
SQLAlchemy 1.4.16 SQL support for databases other than sqlite
360+
psycopg2 2.8.6 PostgreSQL engine for sqlalchemy
361+
pymysql 1.0.2 MySQL engine for sqlalchemy
362362
========================= ================== =============================================================
363363

364364
Other data sources
@@ -368,11 +368,11 @@ Other data sources
368368
Dependency Minimum Version Notes
369369
========================= ================== =============================================================
370370
PyTables 3.6.1 HDF5-based reading / writing
371-
blosc 1.20.1 Compression for HDF5
371+
blosc 1.21.0 Compression for HDF5
372372
zlib Compression for HDF5
373373
fastparquet 0.4.0 Parquet reading / writing
374374
pyarrow 1.0.1 Parquet, ORC, and feather reading / writing
375-
pyreadstat 1.1.0 SPSS files (.sav) reading
375+
pyreadstat 1.1.2 SPSS files (.sav) reading
376376
========================= ================== =============================================================
377377

378378
.. _install.warn_orc:
@@ -396,10 +396,10 @@ Access data in the cloud
396396
========================= ================== =============================================================
397397
Dependency Minimum Version Notes
398398
========================= ================== =============================================================
399-
fsspec 0.7.4 Handling files aside from simple local and HTTP
400-
gcsfs 0.6.0 Google Cloud Storage access
401-
pandas-gbq 0.14.0 Google Big Query access
402-
s3fs 0.4.0 Amazon S3 access
399+
fsspec 2021.5.0 Handling files aside from simple local and HTTP
400+
gcsfs 2021.5.0 Google Cloud Storage access
401+
pandas-gbq 0.15.0 Google Big Query access
402+
s3fs 2021.05.0 Amazon S3 access
403403
========================= ================== =============================================================
404404

405405
Clipboard

doc/source/user_guide/groupby.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -345,6 +345,17 @@ Index level names may be supplied as keys.
345345
346346
More on the ``sum`` function and aggregation later.
347347

348+
When using ``.groupby()`` on a DatFrame with a MultiIndex, do not specify both ``by`` and ``level``.
349+
The argument validation should be done in ``.groupby()``, using the name of the specific index.
350+
351+
.. ipython:: python
352+
353+
df = pd.DataFrame({"col1": ["a", "b", "c"]})
354+
df.index = pd.MultiIndex.from_arrays([["a", "a", "b"],
355+
[1, 2, 1]],
356+
names=["x", "y"])
357+
df.groupby(["col1", "x"])
358+
348359
Grouping DataFrame with Index levels and columns
349360
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
350361
A DataFrame may be grouped by a combination of columns and index levels by

doc/source/whatsnew/v1.4.4.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ including other versions of pandas.
1515
Fixed regressions
1616
~~~~~~~~~~~~~~~~~
1717
- Fixed regression in :func:`concat` materializing :class:`Index` during sorting even if :class:`Index` was already sorted (:issue:`47501`)
18+
- Fixed regression in setting ``None`` or non-string value into a ``string``-dtype Series using a mask (:issue:`47628`)
1819
-
1920

2021
.. ---------------------------------------------------------------------------

doc/source/whatsnew/v1.5.0.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -278,6 +278,7 @@ Other enhancements
278278
- :meth:`DatetimeIndex.astype` now supports casting timezone-naive indexes to ``datetime64[s]``, ``datetime64[ms]``, and ``datetime64[us]``, and timezone-aware indexes to the corresponding ``datetime64[unit, tzname]`` dtypes (:issue:`47579`)
279279
- :class:`Series` reducers (e.g. ``min``, ``max``, ``sum``, ``mean``) will now successfully operate when the dtype is numeric and ``numeric_only=True`` is provided; previously this would raise a ``NotImplementedError`` (:issue:`47500`)
280280
- :meth:`RangeIndex.union` now can return a :class:`RangeIndex` instead of a :class:`Int64Index` if the resulting values are equally spaced (:issue:`47557`, :issue:`43885`)
281+
- :meth:`DataFrame.compare` now accepts an argument ``result_names`` to allow the user to specify the result's names of both left and right DataFrame which are being compared. This is by default ``'self'`` and ``'other'`` (:issue:`44354`)
281282

282283
.. ---------------------------------------------------------------------------
283284
.. _whatsnew_150.notable_bug_fixes:
@@ -845,7 +846,7 @@ Numeric
845846
- Bug in operations with array-likes with ``dtype="boolean"`` and :attr:`NA` incorrectly altering the array in-place (:issue:`45421`)
846847
- Bug in division, ``pow`` and ``mod`` operations on array-likes with ``dtype="boolean"`` not being like their ``np.bool_`` counterparts (:issue:`46063`)
847848
- Bug in multiplying a :class:`Series` with ``IntegerDtype`` or ``FloatingDtype`` by an array-like with ``timedelta64[ns]`` dtype incorrectly raising (:issue:`45622`)
848-
-
849+
- Bug in :meth:`mean` where the optional dependency ``bottleneck`` causes precision loss linear in the length of the array. ``bottleneck`` has been disabled for :meth:`mean` improving the loss to log-linear but may result in a performance decrease. (:issue:`42878`)
849850

850851
Conversion
851852
^^^^^^^^^^
@@ -913,6 +914,7 @@ Missing
913914
^^^^^^^
914915
- Bug in :meth:`Series.fillna` and :meth:`DataFrame.fillna` with ``downcast`` keyword not being respected in some cases where there are no NA values present (:issue:`45423`)
915916
- Bug in :meth:`Series.fillna` and :meth:`DataFrame.fillna` with :class:`IntervalDtype` and incompatible value raising instead of casting to a common (usually object) dtype (:issue:`45796`)
917+
- Bug in :meth:`Series.map` not respecting ``na_action`` argument if mapper is a ``dict`` or :class:`Series` (:issue:`47527`)
916918
- Bug in :meth:`DataFrame.interpolate` with object-dtype column not returning a copy with ``inplace=False`` (:issue:`45791`)
917919
- Bug in :meth:`DataFrame.dropna` allows to set both ``how`` and ``thresh`` incompatible arguments (:issue:`46575`)
918920
- Bug in :meth:`DataFrame.fillna` ignored ``axis`` when :class:`DataFrame` is single block (:issue:`47713`)
@@ -955,6 +957,7 @@ I/O
955957
- Bug in :func:`read_sas` that scrambled column names (:issue:`31243`)
956958
- Bug in :func:`read_sas` with RLE-compressed SAS7BDAT files that contain 0x00 control bytes (:issue:`47099`)
957959
- Bug in :func:`read_parquet` with ``use_nullable_dtypes=True`` where ``float64`` dtype was returned instead of nullable ``Float64`` dtype (:issue:`45694`)
960+
- Bug in :meth:`DataFrame.to_json` where ``PeriodDtype`` would not make the serialization roundtrip when read back with :meth:`read_json` (:issue:`44720`)
958961

959962
Period
960963
^^^^^^
@@ -977,6 +980,7 @@ Plotting
977980
- The function :meth:`DataFrame.plot.scatter` now accepts ``color`` as an alias for ``c`` and ``size`` as an alias for ``s`` for consistency to other plotting functions (:issue:`44670`)
978981
- Fix showing "None" as ylabel in :meth:`Series.plot` when not setting ylabel (:issue:`46129`)
979982
- Bug in :meth:`DataFrame.plot` that led to xticks and vertical grids being improperly placed when plotting a quarterly series (:issue:`47602`)
983+
- Bug in :meth:`DataFrame.plot` that prevented setting y-axis label, limits and ticks for a secondary y-axis (:issue:`47753`)
980984

981985
Groupby/resample/rolling
982986
^^^^^^^^^^^^^^^^^^^^^^^^
@@ -1018,6 +1022,7 @@ Reshaping
10181022
- Bug in :meth:`DataFrame.join` with a list when using suffixes to join DataFrames with duplicate column names (:issue:`46396`)
10191023
- Bug in :meth:`DataFrame.pivot_table` with ``sort=False`` results in sorted index (:issue:`17041`)
10201024
- Bug in :meth:`concat` when ``axis=1`` and ``sort=False`` where the resulting Index was a :class:`Int64Index` instead of a :class:`RangeIndex` (:issue:`46675`)
1025+
- Bug in :meth:`wide_to_long` raises when ``stubnames`` is missing in columns and ``i`` contains string dtype column (:issue:`46044`)
10211026

10221027
Sparse
10231028
^^^^^^

environment.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,4 +127,4 @@ dependencies:
127127
# build the interactive terminal
128128
- jupyterlab >=3.4,<4
129129
- pip:
130-
- jupyterlite==0.1.0b9
130+
- jupyterlite==0.1.0b10

pandas/_config/dates.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
"""
22
config for datetime formatting
33
"""
4+
from __future__ import annotations
5+
46
from pandas._config import config as cf
57

68
pc_date_dayfirst_doc = """

pandas/_libs/algos.pyi

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ def groupsort_indexer(
4242
np.ndarray, # ndarray[int64_t, ndim=1]
4343
]: ...
4444
def kth_smallest(
45-
a: np.ndarray, # numeric[:]
45+
arr: np.ndarray, # numeric[:]
4646
k: int,
4747
) -> Any: ... # numeric
4848

@@ -129,18 +129,11 @@ def diff_2d(
129129
) -> None: ...
130130
def ensure_platform_int(arr: object) -> npt.NDArray[np.intp]: ...
131131
def ensure_object(arr: object) -> npt.NDArray[np.object_]: ...
132-
def ensure_complex64(arr: object, copy=...) -> npt.NDArray[np.complex64]: ...
133-
def ensure_complex128(arr: object, copy=...) -> npt.NDArray[np.complex128]: ...
134132
def ensure_float64(arr: object, copy=...) -> npt.NDArray[np.float64]: ...
135-
def ensure_float32(arr: object, copy=...) -> npt.NDArray[np.float32]: ...
136133
def ensure_int8(arr: object, copy=...) -> npt.NDArray[np.int8]: ...
137134
def ensure_int16(arr: object, copy=...) -> npt.NDArray[np.int16]: ...
138135
def ensure_int32(arr: object, copy=...) -> npt.NDArray[np.int32]: ...
139136
def ensure_int64(arr: object, copy=...) -> npt.NDArray[np.int64]: ...
140-
def ensure_uint8(arr: object, copy=...) -> npt.NDArray[np.uint8]: ...
141-
def ensure_uint16(arr: object, copy=...) -> npt.NDArray[np.uint16]: ...
142-
def ensure_uint32(arr: object, copy=...) -> npt.NDArray[np.uint32]: ...
143-
def ensure_uint64(arr: object, copy=...) -> npt.NDArray[np.uint64]: ...
144137
def take_1d_int8_int8(
145138
values: np.ndarray, indexer: npt.NDArray[np.intp], out: np.ndarray, fill_value=...
146139
) -> None: ...

pandas/_libs/groupby.pyi

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -105,26 +105,28 @@ def group_last(
105105
values: np.ndarray, # ndarray[rank_t, ndim=2]
106106
labels: np.ndarray, # const int64_t[:]
107107
mask: npt.NDArray[np.bool_] | None,
108-
result_mask: npt.NDArray[np.bool_] | None,
108+
result_mask: npt.NDArray[np.bool_] | None = ...,
109109
min_count: int = ..., # Py_ssize_t
110+
is_datetimelike: bool = ...,
110111
) -> None: ...
111112
def group_nth(
112113
out: np.ndarray, # rank_t[:, ::1]
113114
counts: np.ndarray, # int64_t[::1]
114115
values: np.ndarray, # ndarray[rank_t, ndim=2]
115116
labels: np.ndarray, # const int64_t[:]
116117
mask: npt.NDArray[np.bool_] | None,
117-
result_mask: npt.NDArray[np.bool_] | None,
118+
result_mask: npt.NDArray[np.bool_] | None = ...,
118119
min_count: int = ..., # int64_t
119120
rank: int = ..., # int64_t
121+
is_datetimelike: bool = ...,
120122
) -> None: ...
121123
def group_rank(
122124
out: np.ndarray, # float64_t[:, ::1]
123125
values: np.ndarray, # ndarray[rank_t, ndim=2]
124126
labels: np.ndarray, # const int64_t[:]
125127
ngroups: int,
126128
is_datetimelike: bool,
127-
ties_method: Literal["aveage", "min", "max", "first", "dense"] = ...,
129+
ties_method: Literal["average", "min", "max", "first", "dense"] = ...,
128130
ascending: bool = ...,
129131
pct: bool = ...,
130132
na_option: Literal["keep", "top", "bottom"] = ...,
@@ -136,6 +138,7 @@ def group_max(
136138
values: np.ndarray, # ndarray[groupby_t, ndim=2]
137139
labels: np.ndarray, # const int64_t[:]
138140
min_count: int = ...,
141+
is_datetimelike: bool = ...,
139142
mask: np.ndarray | None = ...,
140143
result_mask: np.ndarray | None = ...,
141144
) -> None: ...
@@ -145,6 +148,7 @@ def group_min(
145148
values: np.ndarray, # ndarray[groupby_t, ndim=2]
146149
labels: np.ndarray, # const int64_t[:]
147150
min_count: int = ...,
151+
is_datetimelike: bool = ...,
148152
mask: np.ndarray | None = ...,
149153
result_mask: np.ndarray | None = ...,
150154
) -> None: ...
@@ -154,11 +158,17 @@ def group_cummin(
154158
labels: np.ndarray, # const int64_t[:]
155159
ngroups: int,
156160
is_datetimelike: bool,
161+
mask: np.ndarray | None = ...,
162+
result_mask: np.ndarray | None = ...,
163+
skipna: bool = ...,
157164
) -> None: ...
158165
def group_cummax(
159166
out: np.ndarray, # groupby_t[:, ::1]
160167
values: np.ndarray, # ndarray[groupby_t, ndim=2]
161168
labels: np.ndarray, # const int64_t[:]
162169
ngroups: int,
163170
is_datetimelike: bool,
171+
mask: np.ndarray | None = ...,
172+
result_mask: np.ndarray | None = ...,
173+
skipna: bool = ...,
164174
) -> None: ...

pandas/_libs/internals.pyi

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ def update_blklocs_and_blknos(
3232
loc: int,
3333
nblocks: int,
3434
) -> tuple[npt.NDArray[np.intp], npt.NDArray[np.intp]]: ...
35-
35+
@final
3636
class BlockPlacement:
3737
def __init__(self, val: int | slice | np.ndarray): ...
3838
@property

0 commit comments

Comments
 (0)