Skip to content

Commit ce8637a

Browse files
committed
Merge remote-tracking branch 'upstream/main' into test-build-test
2 parents f3c1000 + 8117a55 commit ce8637a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+665
-478
lines changed

asv_bench/benchmarks/array.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,9 @@ def time_setitem_list(self, multiple_chunks):
9090
def time_setitem_slice(self, multiple_chunks):
9191
self.array[::10] = "foo"
9292

93+
def time_setitem_null_slice(self, multiple_chunks):
94+
self.array[:] = "foo"
95+
9396
def time_tolist(self, multiple_chunks):
9497
self.array.tolist()
9598

asv_bench/benchmarks/reshape.py

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,17 @@
1515

1616

1717
class Melt:
18-
def setup(self):
19-
self.df = DataFrame(np.random.randn(10000, 3), columns=["A", "B", "C"])
20-
self.df["id1"] = np.random.randint(0, 10, 10000)
21-
self.df["id2"] = np.random.randint(100, 1000, 10000)
18+
params = ["float64", "Float64"]
19+
param_names = ["dtype"]
20+
21+
def setup(self, dtype):
22+
self.df = DataFrame(
23+
np.random.randn(100_000, 3), columns=["A", "B", "C"], dtype=dtype
24+
)
25+
self.df["id1"] = pd.Series(np.random.randint(0, 10, 10000))
26+
self.df["id2"] = pd.Series(np.random.randint(100, 1000, 10000))
2227

23-
def time_melt_dataframe(self):
28+
def time_melt_dataframe(self, dtype):
2429
melt(self.df, id_vars=["id1", "id2"])
2530

2631

ci/deps/actions-38-downstream_compat.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,6 @@ dependencies:
5656
- zstandard
5757

5858
# downstream packages
59-
- aiobotocore
6059
- botocore
6160
- cftime
6261
- dask

doc/source/whatsnew/v1.5.3.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ Bug fixes
3737

3838
Other
3939
~~~~~
40+
- Reverted deprecation (:issue:`45324`) of behavior of :meth:`Series.__getitem__` and :meth:`Series.__setitem__` slicing with an integer :class:`Index`; this will remain positional (:issue:`49612`)
4041
-
4142

4243
.. ---------------------------------------------------------------------------

doc/source/whatsnew/v2.0.0.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ The ``use_nullable_dtypes`` keyword argument has been expanded to the following
4242
Additionally a new global configuration, ``mode.nullable_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in the following functions
4343
to select the nullable dtypes implementation.
4444

45-
* :func:`read_csv` (with ``engine="pyarrow"``)
45+
* :func:`read_csv` (with ``engine="pyarrow"`` or ``engine="python"``)
4646
* :func:`read_excel`
4747
* :func:`read_parquet`
4848
* :func:`read_orc`
@@ -736,6 +736,7 @@ Performance improvements
736736
- Performance improvement in :meth:`MultiIndex.isin` when ``level=None`` (:issue:`48622`, :issue:`49577`)
737737
- Performance improvement in :meth:`MultiIndex.putmask` (:issue:`49830`)
738738
- Performance improvement in :meth:`Index.union` and :meth:`MultiIndex.union` when index contains duplicates (:issue:`48900`)
739+
- Performance improvement in :meth:`Series.rank` for pyarrow-backed dtypes (:issue:`50264`)
739740
- Performance improvement in :meth:`Series.fillna` for extension array dtypes (:issue:`49722`, :issue:`50078`)
740741
- Performance improvement for :meth:`Series.value_counts` with nullable dtype (:issue:`48338`)
741742
- Performance improvement for :class:`Series` constructor passing integer numpy array with nullable dtype (:issue:`48338`)
@@ -748,6 +749,7 @@ Performance improvements
748749
- Reduce memory usage of :meth:`DataFrame.to_pickle`/:meth:`Series.to_pickle` when using BZ2 or LZMA (:issue:`49068`)
749750
- Performance improvement for :class:`~arrays.StringArray` constructor passing a numpy array with type ``np.str_`` (:issue:`49109`)
750751
- Performance improvement in :meth:`~arrays.ArrowExtensionArray.factorize` (:issue:`49177`)
752+
- Performance improvement in :meth:`~arrays.ArrowExtensionArray.__setitem__` when key is a null slice (:issue:`50248`)
751753
- Performance improvement in :meth:`~arrays.ArrowExtensionArray.to_numpy` (:issue:`49973`)
752754
- Performance improvement in :meth:`DataFrame.join` when joining on a subset of a :class:`MultiIndex` (:issue:`48611`)
753755
- Performance improvement for :meth:`MultiIndex.intersection` (:issue:`48604`)
@@ -831,6 +833,7 @@ Interval
831833

832834
Indexing
833835
^^^^^^^^
836+
- Bug in :meth:`DataFrame.__setitem__` raising when indexer is a :class:`DataFrame` with ``boolean`` dtype (:issue:`47125`)
834837
- Bug in :meth:`DataFrame.reindex` filling with wrong values when indexing columns and index for ``uint`` dtypes (:issue:`48184`)
835838
- Bug in :meth:`DataFrame.loc` coercing dtypes when setting values with a list indexer (:issue:`49159`)
836839
- Bug in :meth:`DataFrame.loc` raising ``ValueError`` with ``bool`` indexer and :class:`MultiIndex` (:issue:`47687`)
@@ -870,6 +873,7 @@ I/O
870873
- Bug in :func:`read_sas` caused fragmentation of :class:`DataFrame` and raised :class:`.errors.PerformanceWarning` (:issue:`48595`)
871874
- Improved error message in :func:`read_excel` by including the offending sheet name when an exception is raised while reading a file (:issue:`48706`)
872875
- Bug when a pickling a subset PyArrow-backed data that would serialize the entire data instead of the subset (:issue:`42600`)
876+
- Bug in :func:`read_sql_query` ignoring ``dtype`` argument when ``chunksize`` is specified and result is empty (:issue:`50245`)
873877
- Bug in :func:`read_csv` for a single-line csv with fewer columns than ``names`` raised :class:`.errors.ParserError` with ``engine="c"`` (:issue:`47566`)
874878
- Bug in displaying ``string`` dtypes not showing storage option (:issue:`50099`)
875879
- Bug in :func:`DataFrame.to_string` with ``header=False`` that printed the index name on the same line as the first row of the data (:issue:`49230`)
@@ -906,6 +910,7 @@ Reshaping
906910
^^^^^^^^^
907911
- Bug in :meth:`DataFrame.pivot_table` raising ``TypeError`` for nullable dtype and ``margins=True`` (:issue:`48681`)
908912
- Bug in :meth:`DataFrame.unstack` and :meth:`Series.unstack` unstacking wrong level of :class:`MultiIndex` when :class:`MultiIndex` has mixed names (:issue:`48763`)
913+
- Bug in :meth:`DataFrame.melt` losing extension array dtype (:issue:`41570`)
909914
- Bug in :meth:`DataFrame.pivot` not respecting ``None`` as column name (:issue:`48293`)
910915
- Bug in :func:`join` when ``left_on`` or ``right_on`` is or includes a :class:`CategoricalIndex` incorrectly raising ``AttributeError`` (:issue:`48464`)
911916
- Bug in :meth:`DataFrame.pivot_table` raising ``ValueError`` with parameter ``margins=True`` when result is an empty :class:`DataFrame` (:issue:`49240`)

environment.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,6 @@ dependencies:
6060
- zstandard
6161

6262
# downstream packages
63-
- aiobotocore<2.0.0 # GH#44311 pinned to fix docbuild
6463
- dask-core
6564
- seaborn-base
6665

@@ -69,7 +68,7 @@ dependencies:
6968
- flask
7069

7170
# benchmarks
72-
- asv
71+
- asv>=0.5.1
7372

7473
# The compiler packages are meta-packages and install the correct compiler (activation) packages on the respective platforms.
7574
- c-compiler

pandas/_libs/tslib.pyi

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ def format_array_from_datetime(
1212
reso: int = ..., # NPY_DATETIMEUNIT
1313
) -> npt.NDArray[np.object_]: ...
1414
def array_with_unit_to_datetime(
15-
values: np.ndarray,
15+
values: npt.NDArray[np.object_],
1616
unit: str,
1717
errors: str = ...,
1818
) -> tuple[np.ndarray, tzinfo | None]: ...

pandas/_libs/tslib.pyx

Lines changed: 5 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@ import_datetime()
1818

1919
cimport numpy as cnp
2020
from numpy cimport (
21-
float64_t,
2221
int64_t,
2322
ndarray,
2423
)
@@ -231,7 +230,7 @@ def format_array_from_datetime(
231230

232231

233232
def array_with_unit_to_datetime(
234-
ndarray values,
233+
ndarray[object] values,
235234
str unit,
236235
str errors="coerce"
237236
):
@@ -266,70 +265,24 @@ def array_with_unit_to_datetime(
266265
cdef:
267266
Py_ssize_t i, n=len(values)
268267
int64_t mult
269-
int prec = 0
270-
ndarray[float64_t] fvalues
271268
bint is_ignore = errors=="ignore"
272269
bint is_coerce = errors=="coerce"
273270
bint is_raise = errors=="raise"
274-
bint need_to_iterate = True
275271
ndarray[int64_t] iresult
276272
ndarray[object] oresult
277-
ndarray mask
278273
object tz = None
279274

280275
assert is_ignore or is_coerce or is_raise
281276

282277
if unit == "ns":
283-
if issubclass(values.dtype.type, (np.integer, np.float_)):
284-
result = values.astype("M8[ns]", copy=False)
285-
else:
286-
result, tz = array_to_datetime(
287-
values.astype(object, copy=False),
288-
errors=errors,
289-
)
278+
result, tz = array_to_datetime(
279+
values.astype(object, copy=False),
280+
errors=errors,
281+
)
290282
return result, tz
291283

292284
mult, _ = precision_from_unit(unit)
293285

294-
if is_raise:
295-
# try a quick conversion to i8/f8
296-
# if we have nulls that are not type-compat
297-
# then need to iterate
298-
299-
if values.dtype.kind in ["i", "f", "u"]:
300-
iresult = values.astype("i8", copy=False)
301-
# fill missing values by comparing to NPY_NAT
302-
mask = iresult == NPY_NAT
303-
# Trying to Convert NaN to integer results in undefined
304-
# behaviour, so handle it explicitly (see GH #48705)
305-
if values.dtype.kind == "f":
306-
mask |= values != values
307-
iresult[mask] = 0
308-
fvalues = iresult.astype("f8") * mult
309-
need_to_iterate = False
310-
311-
if not need_to_iterate:
312-
# check the bounds
313-
if (fvalues < Timestamp.min.value).any() or (
314-
(fvalues > Timestamp.max.value).any()
315-
):
316-
raise OutOfBoundsDatetime(f"cannot convert input with unit '{unit}'")
317-
318-
if values.dtype.kind in ["i", "u"]:
319-
result = (iresult * mult).astype("M8[ns]")
320-
321-
elif values.dtype.kind == "f":
322-
fresult = (values * mult).astype("f8")
323-
fresult[mask] = 0
324-
if prec:
325-
fresult = round(fresult, prec)
326-
result = fresult.astype("M8[ns]", copy=False)
327-
328-
iresult = result.view("i8")
329-
iresult[mask] = NPY_NAT
330-
331-
return result, tz
332-
333286
result = np.empty(n, dtype="M8[ns]")
334287
iresult = result.view("i8")
335288

pandas/_libs/tslibs/np_datetime.pyx

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -312,10 +312,10 @@ cpdef ndarray astype_overflowsafe(
312312
"""
313313
if values.descr.type_num == dtype.type_num == cnp.NPY_DATETIME:
314314
# i.e. dtype.kind == "M"
315-
pass
315+
dtype_name = "datetime64"
316316
elif values.descr.type_num == dtype.type_num == cnp.NPY_TIMEDELTA:
317317
# i.e. dtype.kind == "m"
318-
pass
318+
dtype_name = "timedelta64"
319319
else:
320320
raise TypeError(
321321
"astype_overflowsafe values.dtype and dtype must be either "
@@ -326,14 +326,14 @@ cpdef ndarray astype_overflowsafe(
326326
NPY_DATETIMEUNIT from_unit = get_unit_from_dtype(values.dtype)
327327
NPY_DATETIMEUNIT to_unit = get_unit_from_dtype(dtype)
328328

329-
if (
330-
from_unit == NPY_DATETIMEUNIT.NPY_FR_GENERIC
331-
or to_unit == NPY_DATETIMEUNIT.NPY_FR_GENERIC
332-
):
329+
if from_unit == NPY_DATETIMEUNIT.NPY_FR_GENERIC:
330+
raise TypeError(f"{dtype_name} values must have a unit specified")
331+
332+
if to_unit == NPY_DATETIMEUNIT.NPY_FR_GENERIC:
333333
# without raising explicitly here, we end up with a SystemError
334334
# built-in function [...] returned a result with an error
335335
raise ValueError(
336-
"datetime64/timedelta64 values and dtype must have a unit specified"
336+
f"{dtype_name} dtype must have a unit specified"
337337
)
338338

339339
if from_unit == to_unit:

pandas/_libs/tslibs/offsets.pyx

Lines changed: 60 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1494,11 +1494,29 @@ cdef class BusinessDay(BusinessMixin):
14941494
"""
14951495
DateOffset subclass representing possibly n business days.
14961496
1497+
Parameters
1498+
----------
1499+
n : int, default 1
1500+
The number of days represented.
1501+
normalize : bool, default False
1502+
Normalize start/end dates to midnight.
1503+
14971504
Examples
14981505
--------
1499-
>>> ts = pd.Timestamp(2022, 8, 5)
1500-
>>> ts + pd.offsets.BusinessDay()
1501-
Timestamp('2022-08-08 00:00:00')
1506+
You can use the parameter ``n`` to represent a shift of n business days.
1507+
1508+
>>> ts = pd.Timestamp(2022, 12, 9, 15)
1509+
>>> ts.strftime('%a %d %b %Y %H:%M')
1510+
'Fri 09 Dec 2022 15:00'
1511+
>>> (ts + pd.offsets.BusinessDay(n=5)).strftime('%a %d %b %Y %H:%M')
1512+
'Fri 16 Dec 2022 15:00'
1513+
1514+
Passing the parameter ``normalize`` equal to True, you shift the start
1515+
of the next business day to midnight.
1516+
1517+
>>> ts = pd.Timestamp(2022, 12, 9, 15)
1518+
>>> ts + pd.offsets.BusinessDay(normalize=True)
1519+
Timestamp('2022-12-12 00:00:00')
15021520
"""
15031521
_period_dtype_code = PeriodDtypeCode.B
15041522
_prefix = "B"
@@ -1610,29 +1628,53 @@ cdef class BusinessHour(BusinessMixin):
16101628
Parameters
16111629
----------
16121630
n : int, default 1
1613-
The number of months represented.
1631+
The number of hours represented.
16141632
normalize : bool, default False
16151633
Normalize start/end dates to midnight before generating date range.
1616-
weekmask : str, Default 'Mon Tue Wed Thu Fri'
1617-
Weekmask of valid business days, passed to ``numpy.busdaycalendar``.
16181634
start : str, time, or list of str/time, default "09:00"
16191635
Start time of your custom business hour in 24h format.
16201636
end : str, time, or list of str/time, default: "17:00"
16211637
End time of your custom business hour in 24h format.
16221638
16231639
Examples
16241640
--------
1625-
>>> from datetime import time
1641+
You can use the parameter ``n`` to represent a shift of n hours.
1642+
1643+
>>> ts = pd.Timestamp(2022, 12, 9, 8)
1644+
>>> ts + pd.offsets.BusinessHour(n=5)
1645+
Timestamp('2022-12-09 14:00:00')
1646+
1647+
You can also change the start and the end of business hours.
1648+
16261649
>>> ts = pd.Timestamp(2022, 8, 5, 16)
1627-
>>> ts + pd.offsets.BusinessHour()
1628-
Timestamp('2022-08-08 09:00:00')
16291650
>>> ts + pd.offsets.BusinessHour(start="11:00")
16301651
Timestamp('2022-08-08 11:00:00')
1631-
>>> ts + pd.offsets.BusinessHour(end=time(19, 0))
1632-
Timestamp('2022-08-05 17:00:00')
1633-
>>> ts + pd.offsets.BusinessHour(start=[time(9, 0), "20:00"],
1634-
... end=["17:00", time(22, 0)])
1635-
Timestamp('2022-08-05 20:00:00')
1652+
1653+
>>> from datetime import time as dt_time
1654+
>>> ts = pd.Timestamp(2022, 8, 5, 22)
1655+
>>> ts + pd.offsets.BusinessHour(end=dt_time(19, 0))
1656+
Timestamp('2022-08-08 10:00:00')
1657+
1658+
Passing the parameter ``normalize`` equal to True, you shift the start
1659+
of the next business hour to midnight.
1660+
1661+
>>> ts = pd.Timestamp(2022, 12, 9, 8)
1662+
>>> ts + pd.offsets.BusinessHour(normalize=True)
1663+
Timestamp('2022-12-09 00:00:00')
1664+
1665+
You can divide your business day hours into several parts.
1666+
1667+
>>> import datetime as dt
1668+
>>> freq = pd.offsets.BusinessHour(start=["06:00", "10:00", "15:00"],
1669+
... end=["08:00", "12:00", "17:00"])
1670+
>>> pd.date_range(dt.datetime(2022, 12, 9), dt.datetime(2022, 12, 13), freq=freq)
1671+
DatetimeIndex(['2022-12-09 06:00:00', '2022-12-09 07:00:00',
1672+
'2022-12-09 10:00:00', '2022-12-09 11:00:00',
1673+
'2022-12-09 15:00:00', '2022-12-09 16:00:00',
1674+
'2022-12-12 06:00:00', '2022-12-12 07:00:00',
1675+
'2022-12-12 10:00:00', '2022-12-12 11:00:00',
1676+
'2022-12-12 15:00:00', '2022-12-12 16:00:00'],
1677+
dtype='datetime64[ns]', freq='BH')
16361678
"""
16371679

16381680
_prefix = "BH"
@@ -3536,6 +3578,7 @@ cdef class CustomBusinessDay(BusinessDay):
35363578
Parameters
35373579
----------
35383580
n : int, default 1
3581+
The number of days represented.
35393582
normalize : bool, default False
35403583
Normalize start/end dates to midnight before generating date range.
35413584
weekmask : str, Default 'Mon Tue Wed Thu Fri'
@@ -3624,7 +3667,7 @@ cdef class CustomBusinessHour(BusinessHour):
36243667
Parameters
36253668
----------
36263669
n : int, default 1
3627-
The number of months represented.
3670+
The number of hours represented.
36283671
normalize : bool, default False
36293672
Normalize start/end dates to midnight before generating date range.
36303673
weekmask : str, Default 'Mon Tue Wed Thu Fri'
@@ -3662,7 +3705,7 @@ cdef class CustomBusinessHour(BusinessHour):
36623705
>>> ts + pd.offsets.CustomBusinessHour(end=dt_time(19, 0))
36633706
Timestamp('2022-08-08 10:00:00')
36643707
3665-
In the example below we divide our business day hours into several parts.
3708+
You can divide your business day hours into several parts.
36663709
36673710
>>> import datetime as dt
36683711
>>> freq = pd.offsets.CustomBusinessHour(start=["06:00", "10:00", "15:00"],
@@ -3692,7 +3735,7 @@ cdef class CustomBusinessHour(BusinessHour):
36923735
'Fri 16 Dec 2022 12:00'],
36933736
dtype='object')
36943737
3695-
In the example below we define custom holidays by using NumPy business day calendar.
3738+
Using NumPy business day calendar you can define custom holidays.
36963739
36973740
>>> import datetime as dt
36983741
>>> bdc = np.busdaycalendar(holidays=['2022-12-12', '2022-12-14'])

0 commit comments

Comments
 (0)