Skip to content

Commit a5a6691

Browse files
authored
Merge branch 'main' into feat/pydantic-protocol
2 parents fa54b82 + 431dd6f commit a5a6691

37 files changed

+365
-246
lines changed

.pre-commit-config.yaml

Lines changed: 5 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -15,18 +15,11 @@ default_stages: [
1515
ci:
1616
autofix_prs: false
1717
repos:
18-
- repo: local
18+
- repo: https://github.com/hauntsaninja/black-pre-commit-mirror
19+
# black compiled with mypyc
20+
rev: 23.3.0
1921
hooks:
20-
# NOTE: we make `black` a local hook because if it's installed from
21-
# PyPI (rather than from source) then it'll run twice as fast thanks to mypyc
22-
- id: black
23-
name: black
24-
description: "Black: The uncompromising Python code formatter"
25-
entry: black
26-
language: python
27-
require_serial: true
28-
types_or: [python, pyi]
29-
additional_dependencies: [black==23.3.0]
22+
- id: black
3023
- repo: https://github.com/charliermarsh/ruff-pre-commit
3124
rev: v0.0.270
3225
hooks:
@@ -74,7 +67,7 @@ repos:
7467
--linelength=88,
7568
'--filter=-readability/casting,-runtime/int,-build/include_subdir,-readability/fn_size'
7669
]
77-
- repo: https://github.com/pycqa/pylint
70+
- repo: https://github.com/pylint-dev/pylint
7871
rev: v3.0.0a6
7972
hooks:
8073
- id: pylint
@@ -93,11 +86,6 @@ repos:
9386
|^pandas/conftest\.py # keep excluded
9487
args: [--disable=all, --enable=redefined-outer-name]
9588
stages: [manual]
96-
- id: pylint
97-
alias: unspecified-encoding
98-
name: Using open without explicitly specifying an encoding
99-
args: [--disable=all, --enable=unspecified-encoding]
100-
stages: [manual]
10189
- repo: https://github.com/PyCQA/isort
10290
rev: 5.12.0
10391
hooks:

ci/code_checks.sh

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -110,12 +110,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
110110
pandas_object \
111111
pandas.api.interchange.from_dataframe \
112112
pandas.DatetimeIndex.snap \
113-
pandas.core.window.ewm.ExponentialMovingWindow.mean \
114-
pandas.core.window.ewm.ExponentialMovingWindow.sum \
115-
pandas.core.window.ewm.ExponentialMovingWindow.std \
116-
pandas.core.window.ewm.ExponentialMovingWindow.var \
117-
pandas.core.window.ewm.ExponentialMovingWindow.corr \
118-
pandas.core.window.ewm.ExponentialMovingWindow.cov \
119113
pandas.api.indexers.BaseIndexer \
120114
pandas.api.indexers.VariableOffsetWindowIndexer \
121115
pandas.io.formats.style.Styler \

doc/source/user_guide/10min.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,16 @@ Customarily, we import as follows:
1616
import numpy as np
1717
import pandas as pd
1818
19+
Basic data structures in pandas
20+
-------------------------------
21+
22+
Pandas provides two types of classes for handling data:
23+
24+
1. :class:`Series`: a one-dimensional labeled array holding data of any type
25+
such as integers, strings, Python objects etc.
26+
2. :class:`DataFrame`: a two-dimensional data structure that holds data like
27+
a two-dimension array or a table with rows and columns.
28+
1929
Object creation
2030
---------------
2131

doc/source/user_guide/io.rst

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1568,8 +1568,7 @@ class of the csv module. For this, you have to specify ``sep=None``.
15681568
.. ipython:: python
15691569
15701570
df = pd.DataFrame(np.random.randn(10, 4))
1571-
df.to_csv("tmp.csv", sep="|")
1572-
df.to_csv("tmp2.csv", sep=":")
1571+
df.to_csv("tmp2.csv", sep=":", index=False)
15731572
pd.read_csv("tmp2.csv", sep=None, engine="python")
15741573
15751574
.. ipython:: python
@@ -1597,8 +1596,8 @@ rather than reading the entire file into memory, such as the following:
15971596
.. ipython:: python
15981597
15991598
df = pd.DataFrame(np.random.randn(10, 4))
1600-
df.to_csv("tmp.csv", sep="|")
1601-
table = pd.read_csv("tmp.csv", sep="|")
1599+
df.to_csv("tmp.csv", index=False)
1600+
table = pd.read_csv("tmp.csv")
16021601
table
16031602
16041603
@@ -1607,8 +1606,8 @@ value will be an iterable object of type ``TextFileReader``:
16071606

16081607
.. ipython:: python
16091608
1610-
with pd.read_csv("tmp.csv", sep="|", chunksize=4) as reader:
1611-
reader
1609+
with pd.read_csv("tmp.csv", chunksize=4) as reader:
1610+
print(reader)
16121611
for chunk in reader:
16131612
print(chunk)
16141613
@@ -1620,8 +1619,8 @@ Specifying ``iterator=True`` will also return the ``TextFileReader`` object:
16201619

16211620
.. ipython:: python
16221621
1623-
with pd.read_csv("tmp.csv", sep="|", iterator=True) as reader:
1624-
reader.get_chunk(5)
1622+
with pd.read_csv("tmp.csv", iterator=True) as reader:
1623+
print(reader.get_chunk(5))
16251624
16261625
.. ipython:: python
16271626
:suppress:

doc/source/whatsnew/v2.1.0.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,14 @@ Copy-on-Write improvements
2727
of those Index objects for the columns of the DataFrame (:issue:`52947`)
2828
- Add lazy copy mechanism to :meth:`DataFrame.eval` (:issue:`53746`)
2929

30+
- Trying to operate inplace on a temporary column selection
31+
(for example, ``df["a"].fillna(100, inplace=True)``)
32+
will now always raise a warning when Copy-on-Write is enabled. In this mode,
33+
operating inplace like this will never work, since the selection behaves
34+
as a temporary copy. This holds true for:
35+
36+
- DataFrame.fillna / Series.fillna
37+
3038
.. _whatsnew_210.enhancements.enhancement2:
3139

3240
``map(func, na_action="ignore")`` now works for all array types
@@ -241,6 +249,7 @@ Other API changes
241249
Deprecations
242250
~~~~~~~~~~~~
243251
- Deprecated 'broadcast_axis' keyword in :meth:`Series.align` and :meth:`DataFrame.align`, upcast before calling ``align`` with ``left = DataFrame({col: left for col in right.columns}, index=right.index)`` (:issue:`51856`)
252+
- Deprecated 'downcast' keyword in :meth:`Index.fillna` (:issue:`53956`)
244253
- Deprecated 'fill_method' and 'limit' keywords in :meth:`DataFrame.pct_change`, :meth:`Series.pct_change`, :meth:`DataFrameGroupBy.pct_change`, and :meth:`SeriesGroupBy.pct_change`, explicitly call ``ffill`` or ``bfill`` before calling ``pct_change`` instead (:issue:`53491`)
245254
- Deprecated 'method', 'limit', and 'fill_axis' keywords in :meth:`DataFrame.align` and :meth:`Series.align`, explicitly call ``fillna`` on the alignment results instead (:issue:`51856`)
246255
- Deprecated 'quantile' keyword in :meth:`Rolling.quantile` and :meth:`Expanding.quantile`, renamed as 'q' instead (:issue:`52550`)

environment.yml

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@ dependencies:
1717
- pytest-cov
1818
- pytest-xdist>=2.2.0
1919
- pytest-asyncio>=0.17.0
20-
- pytest-localserver>=0.7.1
2120
- coverage
2221

2322
# required dependencies
@@ -40,7 +39,7 @@ dependencies:
4039
- lxml>=4.8.0
4140
- matplotlib>=3.6.1
4241
- numba>=0.55.2
43-
- numexpr>=2.8.0 # pin for "Run checks on imported code" job
42+
- numexpr>=2.8.0
4443
- openpyxl>=3.0.10
4544
- odfpy>=1.4.1
4645
- py
@@ -76,14 +75,10 @@ dependencies:
7675
- cxx-compiler
7776

7877
# code checks
79-
- black=23.3.0
80-
- cpplint
81-
- flake8=6.0.0
82-
- isort>=5.2.1 # check that imports are in the right order
83-
- mypy=1.2
78+
- flake8=6.0.0 # run in subprocess over docstring examples
79+
- mypy=1.2 # pre-commit uses locally installed mypy
80+
- tokenize-rt # scripts/check_for_inconsistent_pandas_namespace.py
8481
- pre-commit>=2.15.0
85-
- pyupgrade
86-
- ruff=0.0.215
8782

8883
# documentation
8984
- gitpython # obtain contributors from git for whatsnew
@@ -119,6 +114,6 @@ dependencies:
119114
- pygments # Code highlighting
120115

121116
- pip:
122-
- sphinx-toggleprompt
117+
- sphinx-toggleprompt # conda-forge version has stricter pins on jinja2
123118
- typing_extensions; python_version<"3.11"
124119
- tzdata>=2022.1

pandas/_libs/tslibs/parsing.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -704,7 +704,7 @@ cdef datetime dateutil_parse(
704704
# we get tzlocal, once the deprecation is enforced will get
705705
# timezone.utc, not raise.
706706
warnings.warn(
707-
"Parsing '{res.tzname}' as tzlocal (dependent on system timezone) "
707+
f"Parsing '{res.tzname}' as tzlocal (dependent on system timezone) "
708708
"is deprecated and will raise in a future version. Pass the 'tz' "
709709
"keyword or call tz_localize after construction instead",
710710
FutureWarning,

pandas/api/typing/__init__.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22
Public API classes that store intermediate results useful for type-hinting.
33
"""
44

5+
from pandas._libs import NaTType
6+
from pandas._libs.missing import NAType
7+
58
from pandas.core.groupby import (
69
DataFrameGroupBy,
710
SeriesGroupBy,
@@ -36,6 +39,8 @@
3639
"ExponentialMovingWindow",
3740
"ExponentialMovingWindowGroupby",
3841
"JsonReader",
42+
"NaTType",
43+
"NAType",
3944
"PeriodIndexResamplerGroupby",
4045
"Resampler",
4146
"Rolling",

pandas/core/arrays/arrow/array.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2026,8 +2026,6 @@ def _str_repeat(self, repeats: int | Sequence[int]):
20262026
raise NotImplementedError(
20272027
f"repeat is not implemented when repeats is {type(repeats).__name__}"
20282028
)
2029-
elif pa_version_under7p0:
2030-
raise NotImplementedError("repeat is not implemented for pyarrow < 7")
20312029
else:
20322030
return type(self)(pc.binary_repeat(self._pa_array, repeats))
20332031

pandas/core/arrays/datetimelike.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2211,7 +2211,15 @@ def factorize(
22112211
codes = codes[::-1]
22122212
uniques = uniques[::-1]
22132213
return codes, uniques
2214-
# FIXME: shouldn't get here; we are ignoring sort
2214+
2215+
if sort:
2216+
# algorithms.factorize only passes sort=True here when freq is
2217+
# not None, so this should not be reached.
2218+
raise NotImplementedError(
2219+
f"The 'sort' keyword in {type(self).__name__}.factorize is "
2220+
"ignored unless arr.freq is not None. To factorize with sort, "
2221+
"call pd.factorize(obj, sort=True) instead."
2222+
)
22152223
return super().factorize(use_na_sentinel=use_na_sentinel)
22162224

22172225
@classmethod

pandas/core/arrays/numpy_.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -247,7 +247,7 @@ def pad_or_backfill(
247247

248248
meth = missing.clean_fill_method(method)
249249
missing.pad_or_backfill_inplace(
250-
out_data,
250+
out_data.T,
251251
method=meth,
252252
axis=0,
253253
limit=limit,

pandas/core/arrays/string_arrow.py

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -307,28 +307,31 @@ def _str_contains(
307307
return super()._str_contains(pat, case, flags, na, regex)
308308

309309
if regex:
310-
if case is False:
311-
fallback_performancewarning()
312-
return super()._str_contains(pat, case, flags, na, regex)
313-
else:
314-
result = pc.match_substring_regex(self._pa_array, pat)
310+
result = pc.match_substring_regex(self._pa_array, pat, ignore_case=not case)
315311
else:
316-
if case:
317-
result = pc.match_substring(self._pa_array, pat)
318-
else:
319-
result = pc.match_substring(pc.utf8_upper(self._pa_array), pat.upper())
312+
result = pc.match_substring(self._pa_array, pat, ignore_case=not case)
320313
result = BooleanDtype().__from_arrow__(result)
321314
if not isna(na):
322315
result[isna(result)] = bool(na)
323316
return result
324317

325318
def _str_startswith(self, pat: str, na=None):
326-
pat = f"^{re.escape(pat)}"
327-
return self._str_contains(pat, na=na, regex=True)
319+
result = pc.starts_with(self._pa_array, pattern=pat)
320+
if not isna(na):
321+
result = result.fill_null(na)
322+
result = BooleanDtype().__from_arrow__(result)
323+
if not isna(na):
324+
result[isna(result)] = bool(na)
325+
return result
328326

329327
def _str_endswith(self, pat: str, na=None):
330-
pat = f"{re.escape(pat)}$"
331-
return self._str_contains(pat, na=na, regex=True)
328+
result = pc.ends_with(self._pa_array, pattern=pat)
329+
if not isna(na):
330+
result = result.fill_null(na)
331+
result = BooleanDtype().__from_arrow__(result)
332+
if not isna(na):
333+
result[isna(result)] = bool(na)
334+
return result
332335

333336
def _str_replace(
334337
self,

pandas/core/frame.py

Lines changed: 4 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -961,13 +961,6 @@ def _is_homogeneous_type(self) -> bool:
961961
-------
962962
bool
963963
964-
See Also
965-
--------
966-
Index._is_homogeneous_type : Whether the object has a single
967-
dtype.
968-
MultiIndex._is_homogeneous_type : Whether all the levels of a
969-
MultiIndex have the same dtype.
970-
971964
Examples
972965
--------
973966
>>> DataFrame({"A": [1, 2], "B": [3, 4]})._is_homogeneous_type
@@ -983,12 +976,8 @@ def _is_homogeneous_type(self) -> bool:
983976
... "B": np.array([1, 2], dtype=np.int64)})._is_homogeneous_type
984977
False
985978
"""
986-
if isinstance(self._mgr, ArrayManager):
987-
return len({arr.dtype for arr in self._mgr.arrays}) == 1
988-
if self._mgr.any_extension_types:
989-
return len({block.dtype for block in self._mgr.blocks}) == 1
990-
else:
991-
return not self._is_mixed_type
979+
# The "<" part of "<=" here is for empty DataFrame cases
980+
return len({arr.dtype for arr in self._mgr.arrays}) <= 1
992981

993982
@property
994983
def _can_fast_transpose(self) -> bool:
@@ -4958,7 +4947,7 @@ def _reindex_multi(
49584947
if row_indexer is not None and col_indexer is not None:
49594948
# Fastpath. By doing two 'take's at once we avoid making an
49604949
# unnecessary copy.
4961-
# We only get here with `not self._is_mixed_type`, which (almost)
4950+
# We only get here with `self._can_fast_transpose`, which (almost)
49624951
# ensures that self.values is cheap. It may be worth making this
49634952
# condition more specific.
49644953
indexer = row_indexer, col_indexer
@@ -10849,17 +10838,7 @@ def count(self, axis: Axis = 0, numeric_only: bool = False):
1084910838
if len(frame._get_axis(axis)) == 0:
1085010839
result = self._constructor_sliced(0, index=frame._get_agg_axis(axis))
1085110840
else:
10852-
if frame._is_mixed_type or frame._mgr.any_extension_types:
10853-
# the or any_extension_types is really only hit for single-
10854-
# column frames with an extension array
10855-
result = notna(frame).sum(axis=axis)
10856-
else:
10857-
# GH13407
10858-
series_counts = notna(frame).sum(axis=axis)
10859-
counts = series_counts._values
10860-
result = self._constructor_sliced(
10861-
counts, index=frame._get_agg_axis(axis), copy=False
10862-
)
10841+
result = notna(frame).sum(axis=axis)
1086310842

1086410843
return result.astype("int64").__finalize__(self, method="count")
1086510844

0 commit comments

Comments
 (0)