Skip to content

Commit 7854611

Browse files
authored
Merge branch 'pandas-dev:main' into idxmin-idxmax-retrun-wrong-type
2 parents 979a96d + b836a88 commit 7854611

File tree

24 files changed

+611
-434
lines changed

24 files changed

+611
-434
lines changed

.github/workflows/32-bit-linux.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,3 +52,7 @@ jobs:
5252
name: Test results
5353
path: test-data.xml
5454
if: failure()
55+
concurrency:
56+
# https://github.community/t/concurrecy-not-work-for-push/183068/7
57+
group: ${{ github.event_name == 'push' && github.run_number || github.ref }}-32bit
58+
cancel-in-progress: true

ci/code_checks.sh

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
9191
pandas.Series.size \
9292
pandas.Series.T \
9393
pandas.Series.hasnans \
94-
pandas.Series.to_timestamp \
9594
pandas.Series.to_list \
9695
pandas.Series.__iter__ \
9796
pandas.Series.keys \
@@ -218,7 +217,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
218217
pandas.Period.year \
219218
pandas.Period.asfreq \
220219
pandas.Period.now \
221-
pandas.Period.to_timestamp \
222220
pandas.arrays.PeriodArray \
223221
pandas.Interval.closed \
224222
pandas.Interval.left \
@@ -562,7 +560,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
562560
pandas.DataFrame.swapaxes \
563561
pandas.DataFrame.first_valid_index \
564562
pandas.DataFrame.last_valid_index \
565-
pandas.DataFrame.to_timestamp \
566563
pandas.DataFrame.attrs \
567564
pandas.DataFrame.plot \
568565
pandas.DataFrame.sparse.density \
@@ -576,7 +573,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
576573
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=EX02 --ignore_functions \
577574
pandas.DataFrame.plot.line \
578575
pandas.Series.plot.line \
579-
pandas.Timestamp.fromtimestamp \
580576
pandas.api.types.infer_dtype \
581577
pandas.api.types.is_datetime64_any_dtype \
582578
pandas.api.types.is_datetime64_ns_dtype \
@@ -590,7 +586,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
590586
pandas.api.types.is_timedelta64_dtype \
591587
pandas.api.types.is_timedelta64_ns_dtype \
592588
pandas.api.types.is_unsigned_integer_dtype \
593-
pandas.core.groupby.DataFrameGroupBy.take \
594589
pandas.io.formats.style.Styler.concat \
595590
pandas.io.formats.style.Styler.export \
596591
pandas.io.formats.style.Styler.set_td_classes \

doc/source/whatsnew/v2.0.0.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -814,7 +814,7 @@ Other API changes
814814
- The levels of the index of the :class:`Series` returned from ``Series.sparse.from_coo`` now always have dtype ``int32``. Previously they had dtype ``int64`` (:issue:`50926`)
815815
- :func:`to_datetime` with ``unit`` of either "Y" or "M" will now raise if a sequence contains a non-round ``float`` value, matching the ``Timestamp`` behavior (:issue:`50301`)
816816
- The methods :meth:`Series.round`, :meth:`DataFrame.__invert__`, :meth:`Series.__invert__`, :meth:`DataFrame.swapaxes`, :meth:`DataFrame.first`, :meth:`DataFrame.last`, :meth:`Series.first`, :meth:`Series.last` and :meth:`DataFrame.align` will now always return new objects (:issue:`51032`)
817-
- :class:`DataFrameGroupBy` aggregations (e.g. "sum") with object-dtype columns no longer infer non-object dtypes for their results, explicitly call ``result.infer_objects(copy=False)`` on the result to obtain the old behavior (:issue:`51205`)
817+
- :class:`DataFrame` and :class:`DataFrameGroupBy` aggregations (e.g. "sum") with object-dtype columns no longer infer non-object dtypes for their results, explicitly call ``result.infer_objects(copy=False)`` on the result to obtain the old behavior (:issue:`51205`, :issue:`49603`)
818818
- Added :func:`pandas.api.types.is_any_real_numeric_dtype` to check for real numeric dtypes (:issue:`51152`)
819819
-
820820

@@ -1226,10 +1226,11 @@ Numeric
12261226
^^^^^^^
12271227
- Bug in :meth:`DataFrame.add` cannot apply ufunc when inputs contain mixed DataFrame type and Series type (:issue:`39853`)
12281228
- Bug in arithmetic operations on :class:`Series` not propagating mask when combining masked dtypes and numpy dtypes (:issue:`45810`, :issue:`42630`)
1229-
- Bug in DataFrame reduction methods (e.g. :meth:`DataFrame.sum`) with object dtype, ``axis=1`` and ``numeric_only=False`` would not be coerced to float (:issue:`49551`)
12301229
- Bug in :meth:`DataFrame.sem` and :meth:`Series.sem` where an erroneous ``TypeError`` would always raise when using data backed by an :class:`ArrowDtype` (:issue:`49759`)
12311230
- Bug in :meth:`Series.__add__` casting to object for list and masked :class:`Series` (:issue:`22962`)
1231+
- Bug in :meth:`~arrays.ArrowExtensionArray.mode` where ``dropna=False`` was not respected when there was ``NA`` values (:issue:`50982`)
12321232
- Bug in :meth:`DataFrame.query` with ``engine="numexpr"`` and column names are ``min`` or ``max`` would raise a ``TypeError`` (:issue:`50937`)
1233+
- Bug in :meth:`DataFrame.min` and :meth:`DataFrame.max` with tz-aware data containing ``pd.NaT`` and ``axis=1`` would return incorrect results (:issue:`51242`)
12331234

12341235
Conversion
12351236
^^^^^^^^^^

pandas/_libs/tslibs/nattype.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -570,7 +570,7 @@ class NaTType(_NaT):
570570
571571
Examples
572572
--------
573-
>>> pd.Timestamp.fromtimestamp(1584199972)
573+
>>> pd.Timestamp.fromtimestamp(1584199972) # doctest: +SKIP
574574
Timestamp('2020-03-14 15:32:52')
575575
576576
Note that the output may change depending on your local time.

pandas/_libs/tslibs/period.pyx

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1841,6 +1841,13 @@ cdef class _Period(PeriodMixin):
18411841
Returns
18421842
-------
18431843
Timestamp
1844+
1845+
Examples
1846+
--------
1847+
>>> period = pd.Period('2023-1-1', freq='D')
1848+
>>> timestamp = period.to_timestamp()
1849+
>>> timestamp
1850+
Timestamp('2023-01-01 00:00:00')
18441851
"""
18451852
how = validate_end_alias(how)
18461853

pandas/_libs/tslibs/timestamps.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1451,7 +1451,7 @@ class Timestamp(_Timestamp):
14511451
14521452
Examples
14531453
--------
1454-
>>> pd.Timestamp.fromtimestamp(1584199972)
1454+
>>> pd.Timestamp.fromtimestamp(1584199972) # doctest: +SKIP
14551455
Timestamp('2020-03-14 15:32:52')
14561456
14571457
Note that the output may change depending on your local time.

pandas/conftest.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -293,6 +293,14 @@ def ordered(request):
293293
return request.param
294294

295295

296+
@pytest.fixture(params=[True, False])
297+
def skipna(request):
298+
"""
299+
Boolean 'skipna' parameter.
300+
"""
301+
return request.param
302+
303+
296304
@pytest.fixture(params=["first", "last", False])
297305
def keep(request):
298306
"""

pandas/core/algorithms.py

Lines changed: 0 additions & 226 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,8 @@
88
from textwrap import dedent
99
from typing import (
1010
TYPE_CHECKING,
11-
Hashable,
1211
Literal,
13-
Sequence,
1412
cast,
15-
final,
1613
)
1714
import warnings
1815

@@ -29,7 +26,6 @@
2926
ArrayLike,
3027
AxisInt,
3128
DtypeObj,
32-
IndexLabel,
3329
TakeIndexer,
3430
npt,
3531
)
@@ -97,7 +93,6 @@
9793

9894
from pandas import (
9995
Categorical,
100-
DataFrame,
10196
Index,
10297
Series,
10398
)
@@ -1167,227 +1162,6 @@ def checked_add_with_arr(
11671162
return result
11681163

11691164

1170-
# --------------- #
1171-
# select n #
1172-
# --------------- #
1173-
1174-
1175-
class SelectN:
1176-
def __init__(self, obj, n: int, keep: str) -> None:
1177-
self.obj = obj
1178-
self.n = n
1179-
self.keep = keep
1180-
1181-
if self.keep not in ("first", "last", "all"):
1182-
raise ValueError('keep must be either "first", "last" or "all"')
1183-
1184-
def compute(self, method: str) -> DataFrame | Series:
1185-
raise NotImplementedError
1186-
1187-
@final
1188-
def nlargest(self):
1189-
return self.compute("nlargest")
1190-
1191-
@final
1192-
def nsmallest(self):
1193-
return self.compute("nsmallest")
1194-
1195-
@final
1196-
@staticmethod
1197-
def is_valid_dtype_n_method(dtype: DtypeObj) -> bool:
1198-
"""
1199-
Helper function to determine if dtype is valid for
1200-
nsmallest/nlargest methods
1201-
"""
1202-
return (
1203-
not is_complex_dtype(dtype)
1204-
if is_numeric_dtype(dtype)
1205-
else needs_i8_conversion(dtype)
1206-
)
1207-
1208-
1209-
class SelectNSeries(SelectN):
1210-
"""
1211-
Implement n largest/smallest for Series
1212-
1213-
Parameters
1214-
----------
1215-
obj : Series
1216-
n : int
1217-
keep : {'first', 'last'}, default 'first'
1218-
1219-
Returns
1220-
-------
1221-
nordered : Series
1222-
"""
1223-
1224-
def compute(self, method: str) -> Series:
1225-
from pandas.core.reshape.concat import concat
1226-
1227-
n = self.n
1228-
dtype = self.obj.dtype
1229-
if not self.is_valid_dtype_n_method(dtype):
1230-
raise TypeError(f"Cannot use method '{method}' with dtype {dtype}")
1231-
1232-
if n <= 0:
1233-
return self.obj[[]]
1234-
1235-
dropped = self.obj.dropna()
1236-
nan_index = self.obj.drop(dropped.index)
1237-
1238-
# slow method
1239-
if n >= len(self.obj):
1240-
ascending = method == "nsmallest"
1241-
return self.obj.sort_values(ascending=ascending).head(n)
1242-
1243-
# fast method
1244-
new_dtype = dropped.dtype
1245-
arr = _ensure_data(dropped.values)
1246-
if method == "nlargest":
1247-
arr = -arr
1248-
if is_integer_dtype(new_dtype):
1249-
# GH 21426: ensure reverse ordering at boundaries
1250-
arr -= 1
1251-
1252-
elif is_bool_dtype(new_dtype):
1253-
# GH 26154: ensure False is smaller than True
1254-
arr = 1 - (-arr)
1255-
1256-
if self.keep == "last":
1257-
arr = arr[::-1]
1258-
1259-
nbase = n
1260-
narr = len(arr)
1261-
n = min(n, narr)
1262-
1263-
# arr passed into kth_smallest must be contiguous. We copy
1264-
# here because kth_smallest will modify its input
1265-
kth_val = algos.kth_smallest(arr.copy(order="C"), n - 1)
1266-
(ns,) = np.nonzero(arr <= kth_val)
1267-
inds = ns[arr[ns].argsort(kind="mergesort")]
1268-
1269-
if self.keep != "all":
1270-
inds = inds[:n]
1271-
findex = nbase
1272-
else:
1273-
if len(inds) < nbase <= len(nan_index) + len(inds):
1274-
findex = len(nan_index) + len(inds)
1275-
else:
1276-
findex = len(inds)
1277-
1278-
if self.keep == "last":
1279-
# reverse indices
1280-
inds = narr - 1 - inds
1281-
1282-
return concat([dropped.iloc[inds], nan_index]).iloc[:findex]
1283-
1284-
1285-
class SelectNFrame(SelectN):
1286-
"""
1287-
Implement n largest/smallest for DataFrame
1288-
1289-
Parameters
1290-
----------
1291-
obj : DataFrame
1292-
n : int
1293-
keep : {'first', 'last'}, default 'first'
1294-
columns : list or str
1295-
1296-
Returns
1297-
-------
1298-
nordered : DataFrame
1299-
"""
1300-
1301-
def __init__(self, obj: DataFrame, n: int, keep: str, columns: IndexLabel) -> None:
1302-
super().__init__(obj, n, keep)
1303-
if not is_list_like(columns) or isinstance(columns, tuple):
1304-
columns = [columns]
1305-
1306-
columns = cast(Sequence[Hashable], columns)
1307-
columns = list(columns)
1308-
self.columns = columns
1309-
1310-
def compute(self, method: str) -> DataFrame:
1311-
from pandas.core.api import Index
1312-
1313-
n = self.n
1314-
frame = self.obj
1315-
columns = self.columns
1316-
1317-
for column in columns:
1318-
dtype = frame[column].dtype
1319-
if not self.is_valid_dtype_n_method(dtype):
1320-
raise TypeError(
1321-
f"Column {repr(column)} has dtype {dtype}, "
1322-
f"cannot use method {repr(method)} with this dtype"
1323-
)
1324-
1325-
def get_indexer(current_indexer, other_indexer):
1326-
"""
1327-
Helper function to concat `current_indexer` and `other_indexer`
1328-
depending on `method`
1329-
"""
1330-
if method == "nsmallest":
1331-
return current_indexer.append(other_indexer)
1332-
else:
1333-
return other_indexer.append(current_indexer)
1334-
1335-
# Below we save and reset the index in case index contains duplicates
1336-
original_index = frame.index
1337-
cur_frame = frame = frame.reset_index(drop=True)
1338-
cur_n = n
1339-
indexer = Index([], dtype=np.int64)
1340-
1341-
for i, column in enumerate(columns):
1342-
# For each column we apply method to cur_frame[column].
1343-
# If it's the last column or if we have the number of
1344-
# results desired we are done.
1345-
# Otherwise there are duplicates of the largest/smallest
1346-
# value and we need to look at the rest of the columns
1347-
# to determine which of the rows with the largest/smallest
1348-
# value in the column to keep.
1349-
series = cur_frame[column]
1350-
is_last_column = len(columns) - 1 == i
1351-
values = getattr(series, method)(
1352-
cur_n, keep=self.keep if is_last_column else "all"
1353-
)
1354-
1355-
if is_last_column or len(values) <= cur_n:
1356-
indexer = get_indexer(indexer, values.index)
1357-
break
1358-
1359-
# Now find all values which are equal to
1360-
# the (nsmallest: largest)/(nlargest: smallest)
1361-
# from our series.
1362-
border_value = values == values[values.index[-1]]
1363-
1364-
# Some of these values are among the top-n
1365-
# some aren't.
1366-
unsafe_values = values[border_value]
1367-
1368-
# These values are definitely among the top-n
1369-
safe_values = values[~border_value]
1370-
indexer = get_indexer(indexer, safe_values.index)
1371-
1372-
# Go on and separate the unsafe_values on the remaining
1373-
# columns.
1374-
cur_frame = cur_frame.loc[unsafe_values.index]
1375-
cur_n = n - len(indexer)
1376-
1377-
frame = frame.take(indexer)
1378-
1379-
# Restore the index on frame
1380-
frame.index = original_index.take(indexer)
1381-
1382-
# If there is only one column, the frame is already sorted.
1383-
if len(columns) == 1:
1384-
return frame
1385-
1386-
ascending = method == "nsmallest"
1387-
1388-
return frame.sort_values(columns, ascending=ascending, kind="mergesort")
1389-
1390-
13911165
# ---- #
13921166
# take #
13931167
# ---- #

0 commit comments

Comments
 (0)