Skip to content

Commit f3b6ed5

Browse files
authored
Merge branch 'master' into read-tar-archives
2 parents c6573ef + 85c221a commit f3b6ed5

File tree

137 files changed

+2236
-1651
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

137 files changed

+2236
-1651
lines changed

.github/workflows/posix.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ jobs:
3131
[actions-38-slow.yaml, "slow", "", "", "", "", ""],
3232
[actions-38-locale.yaml, "not slow and not network", "language-pack-zh-hans xsel", "zh_CN.utf8", "zh_CN.utf8", "", ""],
3333
[actions-39-slow.yaml, "slow", "", "", "", "", ""],
34+
[actions-pypy-38.yaml, "not slow and not clipboard", "", "", "", "", ""],
3435
[actions-39-numpydev.yaml, "not slow and not network", "xsel", "", "", "deprecate", "-W error"],
3536
[actions-39.yaml, "not slow and not clipboard", "", "", "", "", ""]
3637
]

ci/deps/actions-pypy-38.yaml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
name: pandas-dev
2+
channels:
3+
- conda-forge
4+
dependencies:
5+
# TODO: Add the rest of the dependencies in here
6+
# once the other plentiful failures/segfaults
7+
# with base pandas has been dealt with
8+
- python=3.8
9+
10+
# tools
11+
- cython>=0.29.24
12+
- pytest>=6.0
13+
- pytest-cov
14+
- pytest-xdist>=1.31
15+
- hypothesis>=5.5.3
16+
17+
# required
18+
- numpy
19+
- python-dateutil
20+
- pytz

doc/source/whatsnew/v1.3.5.rst

Lines changed: 3 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
.. _whatsnew_135:
22

3-
What's new in 1.3.5 (November ??, 2021)
3+
What's new in 1.3.5 (December 12, 2021)
44
---------------------------------------
55

66
These are the changes in pandas 1.3.5. See :ref:`release` for a full changelog
@@ -16,29 +16,13 @@ Fixed regressions
1616
~~~~~~~~~~~~~~~~~
1717
- Fixed regression in :meth:`Series.equals` when comparing floats with dtype object to None (:issue:`44190`)
1818
- Fixed regression in :func:`merge_asof` raising error when array was supplied as join key (:issue:`42844`)
19+
- Fixed regression when resampling :class:`DataFrame` with :class:`DateTimeIndex` with empty groups and ``uint8``, ``uint16`` or ``uint32`` columns incorrectly raising ``RuntimeError`` (:issue:`43329`)
1920
- Fixed regression in creating a :class:`DataFrame` from a timezone-aware :class:`Timestamp` scalar near a Daylight Savings Time transition (:issue:`42505`)
2021
- Fixed performance regression in :func:`read_csv` (:issue:`44106`)
2122
- Fixed regression in :meth:`Series.duplicated` and :meth:`Series.drop_duplicates` when Series has :class:`Categorical` dtype with boolean categories (:issue:`44351`)
2223
- Fixed regression in :meth:`.GroupBy.sum` with ``timedelta64[ns]`` dtype containing ``NaT`` failing to treat that value as NA (:issue:`42659`)
23-
-
24+
- Fixed regression in :meth:`.RollingGroupby.cov` and :meth:`.RollingGroupby.corr` when ``other`` had the same shape as each group would incorrectly return superfluous groups in the result (:issue:`42915`)
2425

25-
.. ---------------------------------------------------------------------------
26-
27-
.. _whatsnew_135.bug_fixes:
28-
29-
Bug fixes
30-
~~~~~~~~~
31-
-
32-
-
33-
34-
.. ---------------------------------------------------------------------------
35-
36-
.. _whatsnew_135.other:
37-
38-
Other
39-
~~~~~
40-
-
41-
-
4226

4327
.. ---------------------------------------------------------------------------
4428

doc/source/whatsnew/v1.4.0.rst

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -219,7 +219,7 @@ Other enhancements
219219
- :meth:`DataFrame.dropna` now accepts a single label as ``subset`` along with array-like (:issue:`41021`)
220220
- :class:`ExcelWriter` argument ``if_sheet_exists="overlay"`` option added (:issue:`40231`)
221221
- :meth:`read_excel` now accepts a ``decimal`` argument that allow the user to specify the decimal point when parsing string columns to numeric (:issue:`14403`)
222-
- :meth:`.GroupBy.mean` now supports `Numba <http://numba.pydata.org/>`_ execution with the ``engine`` keyword (:issue:`43731`)
222+
- :meth:`.GroupBy.mean`, :meth:`.GroupBy.std`, and :meth:`.GroupBy.var` now supports `Numba <http://numba.pydata.org/>`_ execution with the ``engine`` keyword (:issue:`43731`, :issue:`44862`)
223223
- :meth:`Timestamp.isoformat`, now handles the ``timespec`` argument from the base :class:``datetime`` class (:issue:`26131`)
224224
- :meth:`NaT.to_numpy` ``dtype`` argument is now respected, so ``np.timedelta64`` can be returned (:issue:`44460`)
225225
- New option ``display.max_dir_items`` customizes the number of columns added to :meth:`Dataframe.__dir__` and suggested for tab completion (:issue:`37996`)
@@ -229,6 +229,8 @@ Other enhancements
229229
- :meth:`Series.info` has been added, for compatibility with :meth:`DataFrame.info` (:issue:`5167`)
230230
- Implemented :meth:`IntervalArray.min`, :meth:`IntervalArray.max`, as a result of which ``min`` and ``max`` now work for :class:`IntervalIndex`, :class:`Series` and :class:`DataFrame` with ``IntervalDtype`` (:issue:`44746`)
231231
- :meth:`UInt64Index.map` now retains ``dtype`` where possible (:issue:`44609`)
232+
- :meth:`read_json` can now parse unsigned long long integers (:issue:`26068`)
233+
- :meth:`DataFrame.take` now raises a ``TypeError`` when passed a scalar for the indexer (:issue:`42875`)
232234
-
233235

234236

@@ -547,7 +549,7 @@ Performance improvements
547549
- Performance improvement in :meth:`.GroupBy.sample`, especially when ``weights`` argument provided (:issue:`34483`)
548550
- Performance improvement when converting non-string arrays to string arrays (:issue:`34483`)
549551
- Performance improvement in :meth:`.GroupBy.transform` for user-defined functions (:issue:`41598`)
550-
- Performance improvement in constructing :class:`DataFrame` objects (:issue:`42631`, :issue:`43142`, :issue:`43147`, :issue:`43307`, :issue:`43144`)
552+
- Performance improvement in constructing :class:`DataFrame` objects (:issue:`42631`, :issue:`43142`, :issue:`43147`, :issue:`43307`, :issue:`43144`, :issue:`44826`)
551553
- Performance improvement in :meth:`GroupBy.shift` when ``fill_value`` argument is provided (:issue:`26615`)
552554
- Performance improvement in :meth:`DataFrame.corr` for ``method=pearson`` on data without missing values (:issue:`40956`)
553555
- Performance improvement in some :meth:`GroupBy.apply` operations (:issue:`42992`, :issue:`43578`)
@@ -644,6 +646,7 @@ Numeric
644646
- Bug in arithmetic operations involving :class:`RangeIndex` where the result would have the incorrect ``name`` (:issue:`43962`)
645647
- Bug in arithmetic operations involving :class:`Series` where the result could have the incorrect ``name`` when the operands having matching NA or matching tuple names (:issue:`44459`)
646648
- Bug in division with ``IntegerDtype`` or ``BooleanDtype`` array and NA scalar incorrectly raising (:issue:`44685`)
649+
- Bug in multiplying a :class:`Series` with ``FloatingDtype`` with a timedelta-like scalar incorrectly raising (:issue:`44772`)
647650
-
648651

649652
Conversion
@@ -657,7 +660,7 @@ Conversion
657660

658661
Strings
659662
^^^^^^^
660-
-
663+
- Fixed bug in checking for ``string[pyarrow]`` dtype incorrectly raising an ImportError when pyarrow is not installed (:issue:`44327`)
661664
-
662665

663666
Interval
@@ -696,8 +699,10 @@ Indexing
696699
- Bug in :meth:`DataFrame.loc.__getitem__` incorrectly raising ``KeyError`` when selecting a single column with a boolean key (:issue:`44322`).
697700
- Bug in setting :meth:`DataFrame.iloc` with a single ``ExtensionDtype`` column and setting 2D values e.g. ``df.iloc[:] = df.values`` incorrectly raising (:issue:`44514`)
698701
- Bug in indexing on columns with ``loc`` or ``iloc`` using a slice with a negative step with ``ExtensionDtype`` columns incorrectly raising (:issue:`44551`)
702+
- Bug in :meth:`DataFrame.loc.__setitem__` changing dtype when indexer was completely ``False`` (:issue:`37550`)
699703
- Bug in :meth:`IntervalIndex.get_indexer_non_unique` returning boolean mask instead of array of integers for a non unique and non monotonic index (:issue:`44084`)
700704
- Bug in :meth:`IntervalIndex.get_indexer_non_unique` not handling targets of ``dtype`` 'object' with NaNs correctly (:issue:`44482`)
705+
- Fixed regression where a single column ``np.matrix`` was no longer coerced to a 1d ``np.ndarray`` when added to a :class:`DataFrame` (:issue:`42376`)
701706
-
702707

703708
Missing
@@ -706,13 +711,15 @@ Missing
706711
- Bug in :meth:`DataFrame.fillna` not replacing missing values when using a dict-like ``value`` and duplicate column names (:issue:`43476`)
707712
- Bug in constructing a :class:`DataFrame` with a dictionary ``np.datetime64`` as a value and ``dtype='timedelta64[ns]'``, or vice-versa, incorrectly casting instead of raising (:issue:`??`)
708713
- Bug in :meth:`Series.interpolate` and :meth:`DataFrame.interpolate` with ``inplace=True`` not writing to the underlying array(s) in-place (:issue:`44749`)
714+
- Bug in :meth:`Index.fillna` incorrectly returning an un-filled :class:`Index` when NA values are present and ``downcast`` argument is specified. This now raises ``NotImplementedError`` instead; do not pass ``downcast`` argument (:issue:`44873`)
709715
-
710716

711717
MultiIndex
712718
^^^^^^^^^^
713719
- Bug in :meth:`MultiIndex.get_loc` where the first level is a :class:`DatetimeIndex` and a string key is passed (:issue:`42465`)
714720
- Bug in :meth:`MultiIndex.reindex` when passing a ``level`` that corresponds to an ``ExtensionDtype`` level (:issue:`42043`)
715721
- Bug in :meth:`MultiIndex.get_loc` raising ``TypeError`` instead of ``KeyError`` on nested tuple (:issue:`42440`)
722+
- Bug in :meth:`MultiIndex.union` setting wrong ``sortorder`` causing errors in subsequent indexing operations with slices (:issue:`44752`)
716723
- Bug in :meth:`MultiIndex.putmask` where the other value was also a :class:`MultiIndex` (:issue:`43212`)
717724
-
718725

@@ -747,6 +754,9 @@ I/O
747754
- Bug in :func:`read_csv` raising ``AttributeError`` when attempting to read a .csv file and infer index column dtype from an nullable integer type (:issue:`44079`)
748755
- :meth:`DataFrame.to_csv` and :meth:`Series.to_csv` with ``compression`` set to ``'zip'`` no longer create a zip file containing a file ending with ".zip". Instead, they try to infer the inner file name more smartly. (:issue:`39465`)
749756
- Bug in :func:`read_csv` when passing simultaneously a parser in ``date_parser`` and ``parse_dates=False``, the parsing was still called (:issue:`44366`)
757+
- Bug in :func:`read_csv` silently ignoring errors when failling to create a memory-mapped file (:issue:`44766`)
758+
- Bug in :func:`read_csv` when passing a ``tempfile.SpooledTemporaryFile`` opened in binary mode (:issue:`44748`)
759+
-
750760

751761
Period
752762
^^^^^^

pandas/_libs/interval.pyx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -516,9 +516,9 @@ def intervals_to_interval_bounds(ndarray intervals, bint validate_closed=True):
516516
517517
Returns
518518
-------
519-
tuple of tuples
520-
left : (ndarray, object, array)
521-
right : (ndarray, object, array)
519+
tuple of
520+
left : ndarray
521+
right : ndarray
522522
closed: str
523523
"""
524524
cdef:

pandas/_libs/join.pyi

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -55,39 +55,39 @@ def asof_join_backward_on_X_by_Y(
5555
left_by_values: np.ndarray, # by_t[:]
5656
right_by_values: np.ndarray, # by_t[:]
5757
allow_exact_matches: bool = ...,
58-
tolerance=...,
58+
tolerance: np.number | int | float | None = ...,
5959
) -> tuple[npt.NDArray[np.intp], npt.NDArray[np.intp]]: ...
6060
def asof_join_forward_on_X_by_Y(
6161
left_values: np.ndarray, # asof_t[:]
6262
right_values: np.ndarray, # asof_t[:]
6363
left_by_values: np.ndarray, # by_t[:]
6464
right_by_values: np.ndarray, # by_t[:]
6565
allow_exact_matches: bool = ...,
66-
tolerance=...,
66+
tolerance: np.number | int | float | None = ...,
6767
) -> tuple[npt.NDArray[np.intp], npt.NDArray[np.intp]]: ...
6868
def asof_join_nearest_on_X_by_Y(
6969
left_values: np.ndarray, # asof_t[:]
7070
right_values: np.ndarray, # asof_t[:]
7171
left_by_values: np.ndarray, # by_t[:]
7272
right_by_values: np.ndarray, # by_t[:]
7373
allow_exact_matches: bool = ...,
74-
tolerance=...,
74+
tolerance: np.number | int | float | None = ...,
7575
) -> tuple[npt.NDArray[np.intp], npt.NDArray[np.intp]]: ...
7676
def asof_join_backward(
7777
left_values: np.ndarray, # asof_t[:]
7878
right_values: np.ndarray, # asof_t[:]
7979
allow_exact_matches: bool = ...,
80-
tolerance=...,
80+
tolerance: np.number | int | float | None = ...,
8181
) -> tuple[npt.NDArray[np.intp], npt.NDArray[np.intp]]: ...
8282
def asof_join_forward(
8383
left_values: np.ndarray, # asof_t[:]
8484
right_values: np.ndarray, # asof_t[:]
8585
allow_exact_matches: bool = ...,
86-
tolerance=...,
86+
tolerance: np.number | int | float | None = ...,
8787
) -> tuple[npt.NDArray[np.intp], npt.NDArray[np.intp]]: ...
8888
def asof_join_nearest(
8989
left_values: np.ndarray, # asof_t[:]
9090
right_values: np.ndarray, # asof_t[:]
9191
allow_exact_matches: bool = ...,
92-
tolerance=...,
92+
tolerance: np.number | int | float | None = ...,
9393
) -> tuple[npt.NDArray[np.intp], npt.NDArray[np.intp]]: ...

pandas/_libs/missing.pyi

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
import numpy as np
2+
from numpy import typing as npt
3+
4+
class NAType: ...
5+
6+
NA: NAType
7+
8+
def is_matching_na(
9+
left: object, right: object, nan_matches_none: bool = ...
10+
) -> bool: ...
11+
def isposinf_scalar(val: object) -> bool: ...
12+
def isneginf_scalar(val: object) -> bool: ...
13+
def checknull(val: object, inf_as_na: bool = ...) -> bool: ...
14+
def isnaobj(arr: np.ndarray, inf_as_na: bool = ...) -> npt.NDArray[np.bool_]: ...
15+
def is_numeric_na(values: np.ndarray) -> npt.NDArray[np.bool_]: ...

pandas/_libs/ops.pyi

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from typing import (
22
Any,
33
Callable,
4+
Iterable,
45
Literal,
56
overload,
67
)
@@ -35,15 +36,15 @@ def vec_binop(
3536
@overload
3637
def maybe_convert_bool(
3738
arr: npt.NDArray[np.object_],
38-
true_values=...,
39-
false_values=...,
39+
true_values: Iterable = ...,
40+
false_values: Iterable = ...,
4041
convert_to_masked_nullable: Literal[False] = ...,
4142
) -> tuple[np.ndarray, None]: ...
4243
@overload
4344
def maybe_convert_bool(
4445
arr: npt.NDArray[np.object_],
45-
true_values=...,
46-
false_values=...,
46+
true_values: Iterable = ...,
47+
false_values: Iterable = ...,
4748
*,
4849
convert_to_masked_nullable: Literal[True],
4950
) -> tuple[np.ndarray, np.ndarray]: ...

pandas/_libs/ops_dispatch.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ UFUNC_ALIASES = {
3434
"true_divide": "truediv",
3535
"power": "pow",
3636
"remainder": "mod",
37-
"divide": "div",
37+
"divide": "truediv",
3838
"equal": "eq",
3939
"not_equal": "ne",
4040
"less": "lt",

pandas/_libs/src/ujson/lib/ultrajson.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -297,6 +297,7 @@ typedef struct __JSONObjectDecoder {
297297
JSOBJ (*endArray)(void *prv, JSOBJ obj);
298298
JSOBJ (*newInt)(void *prv, JSINT32 value);
299299
JSOBJ (*newLong)(void *prv, JSINT64 value);
300+
JSOBJ (*newUnsignedLong)(void *prv, JSUINT64 value);
300301
JSOBJ (*newDouble)(void *prv, double value);
301302
void (*releaseObject)(void *prv, JSOBJ obj, void *decoder);
302303
JSPFN_MALLOC malloc;

pandas/_libs/src/ujson/lib/ultrajsondec.c

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -116,8 +116,8 @@ JSOBJ FASTCALL_MSVC decodePreciseFloat(struct DecoderState *ds) {
116116

117117
JSOBJ FASTCALL_MSVC decode_numeric(struct DecoderState *ds) {
118118
int intNeg = 1;
119-
int mantSize = 0;
120119
JSUINT64 intValue;
120+
JSUINT64 prevIntValue;
121121
int chr;
122122
int decimalCount = 0;
123123
double frcValue = 0.0;
@@ -134,10 +134,10 @@ JSOBJ FASTCALL_MSVC decode_numeric(struct DecoderState *ds) {
134134
} else if (*(offset) == '-') {
135135
offset++;
136136
intNeg = -1;
137+
overflowLimit = LLONG_MIN;
137138
if (*(offset) == 'I') {
138139
goto DECODE_INF;
139140
}
140-
overflowLimit = LLONG_MIN;
141141
}
142142

143143
// Scan integer part
@@ -157,19 +157,18 @@ JSOBJ FASTCALL_MSVC decode_numeric(struct DecoderState *ds) {
157157
case '7':
158158
case '8':
159159
case '9': {
160-
// FIXME: Check for arithmetic overflow here
161-
// PERF: Don't do 64-bit arithmetic here unless we know we have
162-
// to
163-
intValue = intValue * 10ULL + (JSLONG)(chr - 48);
164-
165-
if (intValue > overflowLimit) {
166-
return SetError(ds, -1, overflowLimit == LLONG_MAX
167-
? "Value is too big"
168-
: "Value is too small");
160+
// PERF: Don't do 64-bit arithmetic here unless we have to
161+
prevIntValue = intValue;
162+
intValue = intValue * 10ULL + (JSLONG) (chr - 48);
163+
164+
if (intNeg == 1 && prevIntValue > intValue) {
165+
return SetError(ds, -1, "Value is too big!");
166+
} else if (intNeg == -1 && intValue > overflowLimit) {
167+
return SetError(ds, -1, overflowLimit == LLONG_MAX ?
168+
"Value is too big!" : "Value is too small");
169169
}
170170

171171
offset++;
172-
mantSize++;
173172
break;
174173
}
175174
case '.': {
@@ -196,11 +195,12 @@ JSOBJ FASTCALL_MSVC decode_numeric(struct DecoderState *ds) {
196195
ds->lastType = JT_INT;
197196
ds->start = offset;
198197

199-
if ((intValue >> 31)) {
198+
if (intNeg == 1 && (intValue & 0x8000000000000000ULL) != 0)
199+
return ds->dec->newUnsignedLong(ds->prv, intValue);
200+
else if ((intValue >> 31))
200201
return ds->dec->newLong(ds->prv, (JSINT64)(intValue * (JSINT64)intNeg));
201-
} else {
202+
else
202203
return ds->dec->newInt(ds->prv, (JSINT32)(intValue * intNeg));
203-
}
204204

205205
DECODE_FRACTION:
206206

pandas/_libs/src/ujson/python/JSONtoObj.c

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -479,6 +479,10 @@ JSOBJ Object_newLong(void *prv, JSINT64 value) {
479479
return PyLong_FromLongLong(value);
480480
}
481481

482+
JSOBJ Object_newUnsignedLong(void *prv, JSUINT64 value) {
483+
return PyLong_FromUnsignedLongLong(value);
484+
}
485+
482486
JSOBJ Object_newDouble(void *prv, double value) {
483487
return PyFloat_FromDouble(value);
484488
}
@@ -508,7 +512,8 @@ PyObject *JSONToObj(PyObject *self, PyObject *args, PyObject *kwargs) {
508512
Object_newTrue, Object_newFalse, Object_newNull,
509513
Object_newPosInf, Object_newNegInf, Object_newObject,
510514
Object_endObject, Object_newArray, Object_endArray,
511-
Object_newInteger, Object_newLong, Object_newDouble,
515+
Object_newInteger, Object_newLong, Object_newUnsignedLong,
516+
Object_newDouble,
512517
Object_releaseObject, PyObject_Malloc, PyObject_Free,
513518
PyObject_Realloc};
514519

0 commit comments

Comments
 (0)