Skip to content

Commit 8639ad3

Browse files
author
Marco Gorelli
committed
Merge remote-tracking branch 'upstream/main' into pr/nikitaved-qssummer/format_iso
2 parents c520a51 + b48735a commit 8639ad3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

66 files changed

+293
-754
lines changed

doc/source/whatsnew/v2.0.0.rst

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -192,10 +192,24 @@ Removal of prior version deprecations/changes
192192
- Removed argument ``sort_columns`` in :meth:`DataFrame.plot` and :meth:`Series.plot` (:issue:`47563`)
193193
- Removed argument ``is_copy`` from :meth:`DataFrame.take` and :meth:`Series.take` (:issue:`30615`)
194194
- Removed argument ``kind`` from :meth:`Index.get_slice_bound`, :meth:`Index.slice_indexer` and :meth:`Index.slice_locs` (:issue:`41378`)
195+
- Disallow subclass-specific keywords (e.g. "freq", "tz", "names", "closed") in the :class:`Index` constructor (:issue:`38597`)
195196
- Removed argument ``inplace`` from :meth:`Categorical.remove_unused_categories` (:issue:`37918`)
196197
- Disallow passing non-round floats to :class:`Timestamp` with ``unit="M"`` or ``unit="Y"`` (:issue:`47266`)
197198
- Remove keywords ``convert_float`` and ``mangle_dupe_cols`` from :func:`read_excel` (:issue:`41176`)
198199
- Disallow passing non-keyword arguments to :func:`read_excel` except ``io`` and ``sheet_name`` (:issue:`34418`)
200+
- Disallow passing non-keyword arguments to :meth:`DataFrame.drop_duplicates` except for ``subset`` (:issue:`41485`)
201+
- Disallow passing non-keyword arguments to :meth:`DataFrame.sort_index` and :meth:`Series.sort_index` (:issue:`41506`)
202+
- Disallow passing non-keyword arguments to :meth:`DataFrame.interpolate` and :meth:`Series.interpolate` except for ``method`` (:issue:`41510`)
203+
- Disallow passing non-keyword arguments to :meth:`DataFrame.any` and :meth:`Series.any` (:issue:`44896`)
204+
- Disallow passing non-keyword arguments to :meth:`Index.set_names` except for ``names`` (:issue:`41551`)
205+
- Disallow passing non-keyword arguments to :meth:`Index.join` except for ``other`` (:issue:`46518`)
206+
- Disallow passing non-keyword arguments to :func:`concat` except for ``objs`` (:issue:`41485`)
207+
- Disallow passing non-keyword arguments to :func:`pivot` except for ``data`` (:issue:`48301`)
208+
- Disallow passing non-keyword arguments to :meth:`DataFrame.pivot` (:issue:`48301`)
209+
- Disallow passing non-keyword arguments to :func:`read_json` except for ``path_or_buf`` (:issue:`27573`)
210+
- Disallow passing non-keyword arguments to :func:`read_sas` except for ``filepath_or_buffer`` (:issue:`47154`)
211+
- Disallow passing non-keyword arguments to :func:`read_stata` except for ``filepath_or_buffer`` (:issue:`48128`)
212+
- Disallow passing non-keyword arguments to :func:`read_xml` except for ``path_or_buffer`` (:issue:`45133`)
199213
- Disallow passing non-keyword arguments to :meth:`Series.mask` and :meth:`DataFrame.mask` except ``cond`` and ``other`` (:issue:`41580`)
200214
- Disallow passing non-keyword arguments to :meth:`DataFrame.to_stata` except for ``path`` (:issue:`48128`)
201215
- Disallow passing non-keyword arguments to :meth:`DataFrame.where` and :meth:`Series.where` except for ``cond`` and ``other`` (:issue:`41523`)
@@ -228,16 +242,20 @@ Removal of prior version deprecations/changes
228242
- Removed ``pandas.util.testing`` in favor of ``pandas.testing`` (:issue:`30745`)
229243
- Removed :meth:`Series.str.__iter__` (:issue:`28277`)
230244
- Removed ``pandas.SparseArray`` in favor of :class:`arrays.SparseArray` (:issue:`30642`)
231-
- Removed ``pandas.SparseSeries`` and ``pandas.SparseDataFrame`` (:issue:`30642`)
245+
- Removed ``pandas.SparseSeries`` and ``pandas.SparseDataFrame``, including pickle support. (:issue:`30642`)
232246
- Enforced disallowing a string column label into ``times`` in :meth:`DataFrame.ewm` (:issue:`43265`)
233247
- Enforced disallowing a tuple of column labels into :meth:`.DataFrameGroupBy.__getitem__` (:issue:`30546`)
234248
- Removed setting Categorical._codes directly (:issue:`41429`)
235249
- Enforced :meth:`Rolling.count` with ``min_periods=None`` to default to the size of the window (:issue:`31302`)
236250
- Renamed ``fname`` to ``path`` in :meth:`DataFrame.to_parquet`, :meth:`DataFrame.to_stata` and :meth:`DataFrame.to_feather` (:issue:`30338`)
251+
- Enforced disallowing indexing a :class:`Series` with a single item list with a slice (e.g. ``ser[[slice(0, 2)]]``). Either convert the list to tuple, or pass the slice directly instead (:issue:`31333`)
237252
- Enforced the ``display.max_colwidth`` option to not accept negative integers (:issue:`31569`)
238253
- Removed the ``display.column_space`` option in favor of ``df.to_string(col_space=...)`` (:issue:`47280`)
239254
- Removed the deprecated method ``mad`` from pandas classes (:issue:`11787`)
240255
- Removed the deprecated method ``tshift`` from pandas classes (:issue:`11631`)
256+
- Changed behavior of :class:`DataFrame` constructor given floating-point ``data`` and an integer ``dtype``, when the data cannot be cast losslessly, the floating point dtype is retained, matching :class:`Series` behavior (:issue:`41170`)
257+
- Changed behavior of :class:`DataFrame` constructor when passed a ``dtype`` (other than int) that the data cannot be cast to; it now raises instead of silently ignoring the dtype (:issue:`41733`)
258+
- Changed the behavior of :class:`Series` constructor, it will no longer infer a datetime64 or timedelta64 dtype from string entries (:issue:`41731`)
241259
- Changed behavior of :class:`Index` constructor when passed a ``SparseArray`` or ``SparseDtype`` to retain that dtype instead of casting to ``numpy.ndarray`` (:issue:`43930`)
242260

243261
.. ---------------------------------------------------------------------------

pandas/_libs/lib.pyi

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ def ensure_string_array(
158158
) -> npt.NDArray[np.object_]: ...
159159
def infer_datetimelike_array(
160160
arr: npt.NDArray[np.object_],
161-
) -> tuple[str, bool]: ...
161+
) -> str: ...
162162
def convert_nans_to_NA(
163163
arr: npt.NDArray[np.object_],
164164
) -> npt.NDArray[np.object_]: ...

pandas/_libs/lib.pyx

Lines changed: 15 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,6 @@ from pandas._libs.util cimport (
9595
is_nan,
9696
)
9797

98-
from pandas._libs.tslib import array_to_datetime
9998
from pandas._libs.tslibs import (
10099
OutOfBoundsDatetime,
101100
OutOfBoundsTimedelta,
@@ -1583,25 +1582,19 @@ def infer_datetimelike_array(arr: ndarray[object]) -> tuple[str, bool]:
15831582
Returns
15841583
-------
15851584
str: {datetime, timedelta, date, nat, mixed}
1586-
bool
15871585
"""
15881586
cdef:
15891587
Py_ssize_t i, n = len(arr)
15901588
bint seen_timedelta = False, seen_date = False, seen_datetime = False
15911589
bint seen_tz_aware = False, seen_tz_naive = False
1592-
bint seen_nat = False, seen_str = False
1590+
bint seen_nat = False
15931591
bint seen_period = False, seen_interval = False
1594-
list objs = []
15951592
object v
15961593

15971594
for i in range(n):
15981595
v = arr[i]
15991596
if isinstance(v, str):
1600-
objs.append(v)
1601-
seen_str = True
1602-
1603-
if len(objs) == 3:
1604-
break
1597+
return "mixed"
16051598

16061599
elif v is None or util.is_nan(v):
16071600
# nan or None
@@ -1619,7 +1612,7 @@ def infer_datetimelike_array(arr: ndarray[object]) -> tuple[str, bool]:
16191612
seen_tz_aware = True
16201613

16211614
if seen_tz_naive and seen_tz_aware:
1622-
return "mixed", seen_str
1615+
return "mixed"
16231616
elif util.is_datetime64_object(v):
16241617
# np.datetime64
16251618
seen_datetime = True
@@ -1635,43 +1628,30 @@ def infer_datetimelike_array(arr: ndarray[object]) -> tuple[str, bool]:
16351628
seen_interval = True
16361629
break
16371630
else:
1638-
return "mixed", seen_str
1631+
return "mixed"
16391632

16401633
if seen_period:
16411634
if is_period_array(arr):
1642-
return "period", seen_str
1643-
return "mixed", seen_str
1635+
return "period"
1636+
return "mixed"
16441637

16451638
if seen_interval:
16461639
if is_interval_array(arr):
1647-
return "interval", seen_str
1648-
return "mixed", seen_str
1640+
return "interval"
1641+
return "mixed"
16491642

16501643
if seen_date and not (seen_datetime or seen_timedelta):
1651-
return "date", seen_str
1644+
return "date"
16521645
elif seen_datetime and not seen_timedelta:
1653-
return "datetime", seen_str
1646+
return "datetime"
16541647
elif seen_timedelta and not seen_datetime:
1655-
return "timedelta", seen_str
1648+
return "timedelta"
1649+
elif seen_datetime and seen_timedelta:
1650+
return "mixed"
16561651
elif seen_nat:
1657-
return "nat", seen_str
1652+
return "nat"
16581653

1659-
# short-circuit by trying to
1660-
# actually convert these strings
1661-
# this is for performance as we don't need to try
1662-
# convert *every* string array
1663-
if len(objs):
1664-
try:
1665-
# require_iso8601 as in maybe_infer_to_datetimelike
1666-
array_to_datetime(objs, errors="raise", require_iso8601=True)
1667-
return "datetime", seen_str
1668-
except (ValueError, TypeError):
1669-
pass
1670-
1671-
# we are *not* going to infer from strings
1672-
# for timedelta as too much ambiguity
1673-
1674-
return "mixed", seen_str
1654+
return "mixed"
16751655

16761656

16771657
cdef inline bint is_timedelta(object o):

pandas/_libs/src/klib/khash.h

Lines changed: 19 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,16 @@ int main() {
4747
*/
4848

4949
/*
50+
2013-05-02 (0.2.8):
51+
* Use quadratic probing. When the capacity is power of 2, stepping function
52+
i*(i+1)/2 guarantees to traverse each bucket. It is better than double
53+
hashing on cache performance and is more robust than linear probing.
54+
In theory, double hashing should be more robust than quadratic probing.
55+
However, my implementation is probably not for large hash tables, because
56+
the second hash function is closely tied to the first hash function,
57+
which reduce the effectiveness of double hashing.
58+
Reference: http://research.cs.vt.edu/AVresearch/hashing/quadratic.php
59+
5060
2011-09-16 (0.2.6):
5161
5262
* The capacity is a power of 2. This seems to dramatically improve the
@@ -107,7 +117,7 @@ int main() {
107117
Generic hash table library.
108118
*/
109119

110-
#define AC_VERSION_KHASH_H "0.2.6"
120+
#define AC_VERSION_KHASH_H "0.2.8"
111121

112122
#include <stdlib.h>
113123
#include <string.h>
@@ -177,7 +187,6 @@ typedef khuint_t khiter_t;
177187
#define __ac_set_isboth_false(flag, i) __ac_set_isempty_false(flag, i)
178188
#define __ac_set_isdel_true(flag, i) ((void)0)
179189

180-
181190
// specializations of https://github.com/aappleby/smhasher/blob/master/src/MurmurHash2.cpp
182191
khuint32_t PANDAS_INLINE murmur2_32to32(khuint32_t k){
183192
const khuint32_t SEED = 0xc70f6907UL;
@@ -252,13 +261,6 @@ khuint32_t PANDAS_INLINE murmur2_64to32(khuint64_t k){
252261
return murmur2_32_32to32(k1, k2);
253262
}
254263

255-
256-
#ifdef KHASH_LINEAR
257-
#define __ac_inc(k, m) 1
258-
#else
259-
#define __ac_inc(k, m) (murmur2_32to32(k) | 1) & (m)
260-
#endif
261-
262264
#define __ac_fsize(m) ((m) < 32? 1 : (m)>>5)
263265

264266
#ifndef kroundup32
@@ -310,12 +312,12 @@ static const double __ac_HASH_UPPER = 0.77;
310312
SCOPE khuint_t kh_get_##name(const kh_##name##_t *h, khkey_t key) \
311313
{ \
312314
if (h->n_buckets) { \
313-
khuint_t inc, k, i, last, mask; \
315+
khuint_t k, i, last, mask, step=0;\
314316
mask = h->n_buckets - 1; \
315317
k = __hash_func(key); i = k & mask; \
316-
inc = __ac_inc(k, mask); last = i; /* inc==1 for linear probing */ \
318+
last = i; \
317319
while (!__ac_isempty(h->flags, i) && (__ac_isdel(h->flags, i) || !__hash_equal(h->keys[i], key))) { \
318-
i = (i + inc) & mask; \
320+
i = (i + ++step) & mask; \
319321
if (i == last) return h->n_buckets; \
320322
} \
321323
return __ac_iseither(h->flags, i)? h->n_buckets : i; \
@@ -348,11 +350,10 @@ static const double __ac_HASH_UPPER = 0.77;
348350
if (kh_is_map) val = h->vals[j]; \
349351
__ac_set_isempty_true(h->flags, j); \
350352
while (1) { /* kick-out process; sort of like in Cuckoo hashing */ \
351-
khuint_t inc, k, i; \
353+
khuint_t k, i, step=0;\
352354
k = __hash_func(key); \
353355
i = k & new_mask; \
354-
inc = __ac_inc(k, new_mask); \
355-
while (!__ac_isempty(new_flags, i)) i = (i + inc) & new_mask; \
356+
while (!__ac_isempty(new_flags, i)) i = (i + (++step)) & new_mask; \
356357
__ac_set_isempty_false(new_flags, i); \
357358
if (i < h->n_buckets && __ac_iseither(h->flags, i) == 0) { /* kick out the existing element */ \
358359
{ khkey_t tmp = h->keys[i]; h->keys[i] = key; key = tmp; } \
@@ -385,14 +386,14 @@ static const double __ac_HASH_UPPER = 0.77;
385386
else kh_resize_##name(h, h->n_buckets + 1); /* expand the hash table */ \
386387
} /* TODO: to implement automatically shrinking; resize() already support shrinking */ \
387388
{ \
388-
khuint_t inc, k, i, site, last, mask = h->n_buckets - 1; \
389+
khuint_t k, i, site, last, mask = h->n_buckets - 1, step=0;\
389390
x = site = h->n_buckets; k = __hash_func(key); i = k & mask; \
390391
if (__ac_isempty(h->flags, i)) x = i; /* for speed up */ \
391392
else { \
392-
inc = __ac_inc(k, mask); last = i; \
393+
last = i ; \
393394
while (!__ac_isempty(h->flags, i) && (__ac_isdel(h->flags, i) || !__hash_equal(h->keys[i], key))) { \
394395
if (__ac_isdel(h->flags, i)) site = i; \
395-
i = (i + inc) & mask; \
396+
i = (i + (++step)) & mask; \
396397
if (i == last) { x = site; break; } \
397398
} \
398399
if (x == h->n_buckets) { \

pandas/compat/pickle_compat.py

Lines changed: 1 addition & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,7 @@
77
import copy
88
import io
99
import pickle as pkl
10-
from typing import (
11-
TYPE_CHECKING,
12-
Generator,
13-
)
14-
import warnings
10+
from typing import Generator
1511

1612
import numpy as np
1713

@@ -26,12 +22,6 @@
2622
)
2723
from pandas.core.internals import BlockManager
2824

29-
if TYPE_CHECKING:
30-
from pandas import (
31-
DataFrame,
32-
Series,
33-
)
34-
3525

3626
def load_reduce(self):
3727
stack = self.stack
@@ -68,49 +58,6 @@ def load_reduce(self):
6858
raise
6959

7060

71-
_sparse_msg = """\
72-
73-
Loading a saved '{cls}' as a {new} with sparse values.
74-
'{cls}' is now removed. You should re-save this dataset in its new format.
75-
"""
76-
77-
78-
class _LoadSparseSeries:
79-
# To load a SparseSeries as a Series[Sparse]
80-
81-
# https://github.com/python/mypy/issues/1020
82-
# error: Incompatible return type for "__new__" (returns "Series", but must return
83-
# a subtype of "_LoadSparseSeries")
84-
def __new__(cls) -> Series: # type: ignore[misc]
85-
from pandas import Series
86-
87-
warnings.warn(
88-
_sparse_msg.format(cls="SparseSeries", new="Series"),
89-
FutureWarning,
90-
stacklevel=6,
91-
)
92-
93-
return Series(dtype=object)
94-
95-
96-
class _LoadSparseFrame:
97-
# To load a SparseDataFrame as a DataFrame[Sparse]
98-
99-
# https://github.com/python/mypy/issues/1020
100-
# error: Incompatible return type for "__new__" (returns "DataFrame", but must
101-
# return a subtype of "_LoadSparseFrame")
102-
def __new__(cls) -> DataFrame: # type: ignore[misc]
103-
from pandas import DataFrame
104-
105-
warnings.warn(
106-
_sparse_msg.format(cls="SparseDataFrame", new="DataFrame"),
107-
FutureWarning,
108-
stacklevel=6,
109-
)
110-
111-
return DataFrame()
112-
113-
11461
# If classes are moved, provide compat here.
11562
_class_locations_map = {
11663
("pandas.core.sparse.array", "SparseArray"): ("pandas.core.arrays", "SparseArray"),
@@ -144,14 +91,6 @@ def __new__(cls) -> DataFrame: # type: ignore[misc]
14491
"pandas.core.arrays.sparse",
14592
"SparseArray",
14693
),
147-
("pandas.sparse.series", "SparseSeries"): (
148-
"pandas.compat.pickle_compat",
149-
"_LoadSparseSeries",
150-
),
151-
("pandas.sparse.frame", "SparseDataFrame"): (
152-
"pandas.core.sparse.frame",
153-
"_LoadSparseFrame",
154-
),
15594
("pandas.indexes.base", "_new_Index"): ("pandas.core.indexes.base", "_new_Index"),
15695
("pandas.indexes.base", "Index"): ("pandas.core.indexes.base", "Index"),
15796
("pandas.indexes.numeric", "Int64Index"): (
@@ -183,14 +122,6 @@ def __new__(cls) -> DataFrame: # type: ignore[misc]
183122
"pandas.core.indexes.numeric",
184123
"Float64Index",
185124
),
186-
("pandas.core.sparse.series", "SparseSeries"): (
187-
"pandas.compat.pickle_compat",
188-
"_LoadSparseSeries",
189-
),
190-
("pandas.core.sparse.frame", "SparseDataFrame"): (
191-
"pandas.compat.pickle_compat",
192-
"_LoadSparseFrame",
193-
),
194125
}
195126

196127

pandas/core/arrays/categorical.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@
133133

134134
def _cat_compare_op(op):
135135
opname = f"__{op.__name__}__"
136-
fill_value = True if op is operator.ne else False
136+
fill_value = op is operator.ne
137137

138138
@unpack_zerodim_and_defer(opname)
139139
def func(self, other):

0 commit comments

Comments
 (0)