Skip to content

Commit dc4114a

Browse files
committed
Merge remote-tracking branch 'upstream/master' into styler_apply
2 parents 6657841 + f79468b commit dc4114a

File tree

95 files changed

+2165
-1837
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

95 files changed

+2165
-1837
lines changed

.pre-commit-config.yaml

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,10 @@ repos:
2424
hooks:
2525
- id: isort
2626
- repo: https://github.com/asottile/pyupgrade
27-
rev: v2.7.4
27+
rev: v2.9.0
2828
hooks:
2929
- id: pyupgrade
30-
args: [--py37-plus]
30+
args: [--py37-plus, --keep-runtime-typing]
3131
- repo: https://github.com/pre-commit/pygrep-hooks
3232
rev: v1.7.0
3333
hooks:
@@ -192,6 +192,11 @@ repos:
192192
files: ^pandas/
193193
exclude: ^pandas/tests/
194194
- repo: https://github.com/MarcoGorelli/no-string-hints
195-
rev: v0.1.6
195+
rev: v0.1.7
196196
hooks:
197197
- id: no-string-hints
198+
- repo: https://github.com/MarcoGorelli/abs-imports
199+
rev: v0.1.2
200+
hooks:
201+
- id: abs-imports
202+
files: ^pandas/

asv_bench/benchmarks/io/csv.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -84,8 +84,8 @@ class ToCSVIndexes(BaseIO):
8484
def _create_df(rows, cols):
8585
index_cols = {
8686
"index1": np.random.randint(0, rows, rows),
87-
"index2": np.full(rows, 1, dtype=np.int),
88-
"index3": np.full(rows, 1, dtype=np.int),
87+
"index2": np.full(rows, 1, dtype=int),
88+
"index3": np.full(rows, 1, dtype=int),
8989
}
9090
data_cols = {
9191
f"col{i}": np.random.uniform(0, 100000.0, rows) for i in range(cols)

doc/source/whatsnew/v1.2.2.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ Fixed regressions
1717

1818
- Fixed regression in :func:`read_excel` that caused it to raise ``AttributeError`` when checking version of older xlrd versions (:issue:`38955`)
1919
- Fixed regression in :class:`DataFrame` constructor reordering element when construction from datetime ndarray with dtype not ``"datetime64[ns]"`` (:issue:`39422`)
20+
- Fixed regression in :class:`DataFrame.astype` and :class:`Series.astype` not casting to bytes dtype (:issue:`39474`)
2021
- Fixed regression in :meth:`~DataFrame.to_pickle` failing to create bz2/xz compressed pickle files with ``protocol=5`` (:issue:`39002`)
2122
- Fixed regression in :func:`pandas.testing.assert_series_equal` and :func:`pandas.testing.assert_frame_equal` always raising ``AssertionError`` when comparing extension dtypes (:issue:`39410`)
2223
- Fixed regression in :meth:`~DataFrame.to_csv` opening ``codecs.StreamWriter`` in binary mode instead of in text mode and ignoring user-provided ``mode`` (:issue:`39247`)

doc/source/whatsnew/v1.3.0.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -338,8 +338,12 @@ Indexing
338338
- Bug in setting ``timedelta64`` values into numeric :class:`Series` failing to cast to object dtype (:issue:`39086`)
339339
- Bug in setting :class:`Interval` values into a :class:`Series` or :class:`DataFrame` with mismatched :class:`IntervalDtype` incorrectly casting the new values to the existing dtype (:issue:`39120`)
340340
- Bug in setting ``datetime64`` values into a :class:`Series` with integer-dtype incorrect casting the datetime64 values to integers (:issue:`39266`)
341+
- Bug in :meth:`Index.get_loc` not raising ``KeyError`` when method is specified for ``NaN`` value when ``NaN`` is not in :class:`Index` (:issue:`39382`)
341342
- Bug in incorrectly raising in :meth:`Index.insert`, when setting a new column that cannot be held in the existing ``frame.columns``, or in :meth:`Series.reset_index` or :meth:`DataFrame.reset_index` instead of casting to a compatible dtype (:issue:`39068`)
342343
- Bug in :meth:`RangeIndex.append` where a single object of length 1 was concatenated incorrectly (:issue:`39401`)
344+
- Bug in setting ``numpy.timedelta64`` values into an object-dtype :class:`Series` using a boolean indexer (:issue:`39488`)
345+
- Bug in setting numeric values into a into a boolean-dtypes :class:`Series` using ``at`` or ``iat`` failing to cast to object-dtype (:issue:`39582`)
346+
-
343347

344348
Missing
345349
^^^^^^^
@@ -412,7 +416,9 @@ Reshaping
412416
- :meth:`merge_asof` raises ``ValueError`` instead of cryptic ``TypeError`` in case of non-numerical merge columns (:issue:`29130`)
413417
- Bug in :meth:`DataFrame.join` not assigning values correctly when having :class:`MultiIndex` where at least one dimension is from dtype ``Categorical`` with non-alphabetically sorted categories (:issue:`38502`)
414418
- :meth:`Series.value_counts` and :meth:`Series.mode` return consistent keys in original order (:issue:`12679`, :issue:`11227` and :issue:`39007`)
419+
- Bug in :meth:`DataFrame.stack` not handling ``NaN`` in :class:`MultiIndex` columns correct (:issue:`39481`)
415420
- Bug in :meth:`DataFrame.apply` would give incorrect results when used with a string argument and ``axis=1`` when the axis argument was not supported and now raises a ``ValueError`` instead (:issue:`39211`)
421+
- Bug in :meth:`DataFrame.sort_values` not reshaping index correctly after sorting on columns, when ``ignore_index=True`` (:issue:`39464`)
416422
- Bug in :meth:`DataFrame.append` returning incorrect dtypes with combinations of ``ExtensionDtype`` dtypes (:issue:`39454`)
417423

418424
Sparse
@@ -434,7 +440,11 @@ Other
434440
- Bug in :class:`Index` constructor sometimes silently ignorning a specified ``dtype`` (:issue:`38879`)
435441
- Bug in constructing a :class:`Series` from a list and a :class:`PandasDtype` (:issue:`39357`)
436442
- Bug in :class:`Styler` which caused CSS to duplicate on multiple renders. (:issue:`39395`)
443+
- :meth:`Index.where` behavior now mirrors :meth:`Index.putmask` behavior, i.e. ``index.where(mask, other)`` matches ``index.putmask(~mask, other)`` (:issue:`39412`)
437444
- Bug in :func:`pandas.testing.assert_series_equal`, :func:`pandas.testing.assert_frame_equal`, :func:`pandas.testing.assert_index_equal` and :func:`pandas.testing.assert_extension_array_equal` incorrectly raising when an attribute has an unrecognized NA type (:issue:`39461`)
445+
- Bug in :class:`Styler` where ``subset`` arg in methods raised an error for some valid multiindex slices (:issue:`33562`)
446+
-
447+
-
438448

439449
.. ---------------------------------------------------------------------------
440450

pandas/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,7 @@
180180
import pandas.arrays
181181

182182
# use the closest tagged version if possible
183-
from ._version import get_versions
183+
from pandas._version import get_versions
184184

185185
v = get_versions()
186186
__version__ = v.get("closest-tag", v["version"])

pandas/_libs/index_class_helper.pxi.in

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,14 @@ cdef class {{name}}Engine(IndexEngine):
5757
with warnings.catch_warnings():
5858
# e.g. if values is float64 and `val` is a str, suppress warning
5959
warnings.filterwarnings("ignore", category=FutureWarning)
60+
{{if name in {'Float64', 'Float32'} }}
61+
if util.is_nan(val):
62+
indexer = np.isnan(values)
63+
else:
64+
indexer = values == val
65+
{{else}}
6066
indexer = values == val
67+
{{endif}}
6168
except TypeError:
6269
# if the equality above returns a bool, cython will raise TypeError
6370
# when trying to cast it to ndarray

pandas/_libs/tslibs/__init__.py

Lines changed: 22 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -27,18 +27,28 @@
2727
"tz_compare",
2828
]
2929

30-
from . import dtypes
31-
from .conversion import OutOfBoundsTimedelta, localize_pydatetime
32-
from .dtypes import Resolution
33-
from .nattype import NaT, NaTType, iNaT, is_null_datetimelike, nat_strings
34-
from .np_datetime import OutOfBoundsDatetime
35-
from .offsets import BaseOffset, Tick, to_offset
36-
from .period import IncompatibleFrequency, Period
37-
from .timedeltas import Timedelta, delta_to_nanoseconds, ints_to_pytimedelta
38-
from .timestamps import Timestamp
39-
from .timezones import tz_compare
40-
from .tzconversion import tz_convert_from_utc_single
41-
from .vectorized import (
30+
from pandas._libs.tslibs import dtypes
31+
from pandas._libs.tslibs.conversion import OutOfBoundsTimedelta, localize_pydatetime
32+
from pandas._libs.tslibs.dtypes import Resolution
33+
from pandas._libs.tslibs.nattype import (
34+
NaT,
35+
NaTType,
36+
iNaT,
37+
is_null_datetimelike,
38+
nat_strings,
39+
)
40+
from pandas._libs.tslibs.np_datetime import OutOfBoundsDatetime
41+
from pandas._libs.tslibs.offsets import BaseOffset, Tick, to_offset
42+
from pandas._libs.tslibs.period import IncompatibleFrequency, Period
43+
from pandas._libs.tslibs.timedeltas import (
44+
Timedelta,
45+
delta_to_nanoseconds,
46+
ints_to_pytimedelta,
47+
)
48+
from pandas._libs.tslibs.timestamps import Timestamp
49+
from pandas._libs.tslibs.timezones import tz_compare
50+
from pandas._libs.tslibs.tzconversion import tz_convert_from_utc_single
51+
from pandas._libs.tslibs.vectorized import (
4252
dt64arr_to_periodarr,
4353
get_resolution,
4454
ints_to_pydatetime,

pandas/core/aggregation.py

Lines changed: 2 additions & 216 deletions
Original file line numberDiff line numberDiff line change
@@ -27,18 +27,16 @@
2727
AggFuncType,
2828
AggFuncTypeBase,
2929
AggFuncTypeDict,
30-
AggObjType,
3130
Axis,
3231
FrameOrSeries,
3332
FrameOrSeriesUnion,
3433
)
3534

36-
from pandas.core.dtypes.cast import is_nested_object
3735
from pandas.core.dtypes.common import is_dict_like, is_list_like
38-
from pandas.core.dtypes.generic import ABCDataFrame, ABCNDFrame, ABCSeries
36+
from pandas.core.dtypes.generic import ABCDataFrame, ABCSeries
3937

4038
from pandas.core.algorithms import safe_sort
41-
from pandas.core.base import DataError, SpecificationError
39+
from pandas.core.base import SpecificationError
4240
import pandas.core.common as com
4341
from pandas.core.indexes.api import Index
4442

@@ -532,215 +530,3 @@ def transform_str_or_callable(
532530
return obj.apply(func, args=args, **kwargs)
533531
except Exception:
534532
return func(obj, *args, **kwargs)
535-
536-
537-
def agg_list_like(
538-
obj: AggObjType,
539-
arg: List[AggFuncTypeBase],
540-
_axis: int,
541-
) -> FrameOrSeriesUnion:
542-
"""
543-
Compute aggregation in the case of a list-like argument.
544-
545-
Parameters
546-
----------
547-
obj : Pandas object to compute aggregation on.
548-
arg : list
549-
Aggregations to compute.
550-
_axis : int, 0 or 1
551-
Axis to compute aggregation on.
552-
553-
Returns
554-
-------
555-
Result of aggregation.
556-
"""
557-
from pandas.core.reshape.concat import concat
558-
559-
if _axis != 0:
560-
raise NotImplementedError("axis other than 0 is not supported")
561-
562-
if obj._selected_obj.ndim == 1:
563-
selected_obj = obj._selected_obj
564-
else:
565-
selected_obj = obj._obj_with_exclusions
566-
567-
results = []
568-
keys = []
569-
570-
# degenerate case
571-
if selected_obj.ndim == 1:
572-
for a in arg:
573-
colg = obj._gotitem(selected_obj.name, ndim=1, subset=selected_obj)
574-
try:
575-
new_res = colg.aggregate(a)
576-
577-
except TypeError:
578-
pass
579-
else:
580-
results.append(new_res)
581-
582-
# make sure we find a good name
583-
name = com.get_callable_name(a) or a
584-
keys.append(name)
585-
586-
# multiples
587-
else:
588-
for index, col in enumerate(selected_obj):
589-
colg = obj._gotitem(col, ndim=1, subset=selected_obj.iloc[:, index])
590-
try:
591-
new_res = colg.aggregate(arg)
592-
except (TypeError, DataError):
593-
pass
594-
except ValueError as err:
595-
# cannot aggregate
596-
if "Must produce aggregated value" in str(err):
597-
# raised directly in _aggregate_named
598-
pass
599-
elif "no results" in str(err):
600-
# raised directly in _aggregate_multiple_funcs
601-
pass
602-
else:
603-
raise
604-
else:
605-
results.append(new_res)
606-
keys.append(col)
607-
608-
# if we are empty
609-
if not len(results):
610-
raise ValueError("no results")
611-
612-
try:
613-
return concat(results, keys=keys, axis=1, sort=False)
614-
except TypeError as err:
615-
616-
# we are concatting non-NDFrame objects,
617-
# e.g. a list of scalars
618-
619-
from pandas import Series
620-
621-
result = Series(results, index=keys, name=obj.name)
622-
if is_nested_object(result):
623-
raise ValueError(
624-
"cannot combine transform and aggregation operations"
625-
) from err
626-
return result
627-
628-
629-
def agg_dict_like(
630-
obj: AggObjType,
631-
arg: AggFuncTypeDict,
632-
_axis: int,
633-
) -> FrameOrSeriesUnion:
634-
"""
635-
Compute aggregation in the case of a dict-like argument.
636-
637-
Parameters
638-
----------
639-
obj : Pandas object to compute aggregation on.
640-
arg : dict
641-
label-aggregation pairs to compute.
642-
_axis : int, 0 or 1
643-
Axis to compute aggregation on.
644-
645-
Returns
646-
-------
647-
Result of aggregation.
648-
"""
649-
is_aggregator = lambda x: isinstance(x, (list, tuple, dict))
650-
651-
if _axis != 0: # pragma: no cover
652-
raise ValueError("Can only pass dict with axis=0")
653-
654-
selected_obj = obj._selected_obj
655-
656-
# if we have a dict of any non-scalars
657-
# eg. {'A' : ['mean']}, normalize all to
658-
# be list-likes
659-
# Cannot use arg.values() because arg may be a Series
660-
if any(is_aggregator(x) for _, x in arg.items()):
661-
new_arg: AggFuncTypeDict = {}
662-
for k, v in arg.items():
663-
if not isinstance(v, (tuple, list, dict)):
664-
new_arg[k] = [v]
665-
else:
666-
new_arg[k] = v
667-
668-
# the keys must be in the columns
669-
# for ndim=2, or renamers for ndim=1
670-
671-
# ok for now, but deprecated
672-
# {'A': { 'ra': 'mean' }}
673-
# {'A': { 'ra': ['mean'] }}
674-
# {'ra': ['mean']}
675-
676-
# not ok
677-
# {'ra' : { 'A' : 'mean' }}
678-
if isinstance(v, dict):
679-
raise SpecificationError("nested renamer is not supported")
680-
elif isinstance(selected_obj, ABCSeries):
681-
raise SpecificationError("nested renamer is not supported")
682-
elif (
683-
isinstance(selected_obj, ABCDataFrame) and k not in selected_obj.columns
684-
):
685-
raise KeyError(f"Column '{k}' does not exist!")
686-
687-
arg = new_arg
688-
689-
else:
690-
# deprecation of renaming keys
691-
# GH 15931
692-
keys = list(arg.keys())
693-
if isinstance(selected_obj, ABCDataFrame) and len(
694-
selected_obj.columns.intersection(keys)
695-
) != len(keys):
696-
cols = list(
697-
safe_sort(
698-
list(set(keys) - set(selected_obj.columns.intersection(keys))),
699-
)
700-
)
701-
raise SpecificationError(f"Column(s) {cols} do not exist")
702-
703-
from pandas.core.reshape.concat import concat
704-
705-
if selected_obj.ndim == 1:
706-
# key only used for output
707-
colg = obj._gotitem(obj._selection, ndim=1)
708-
results = {key: colg.agg(how) for key, how in arg.items()}
709-
else:
710-
# key used for column selection and output
711-
results = {key: obj._gotitem(key, ndim=1).agg(how) for key, how in arg.items()}
712-
713-
# set the final keys
714-
keys = list(arg.keys())
715-
716-
# Avoid making two isinstance calls in all and any below
717-
is_ndframe = [isinstance(r, ABCNDFrame) for r in results.values()]
718-
719-
# combine results
720-
if all(is_ndframe):
721-
keys_to_use = [k for k in keys if not results[k].empty]
722-
# Have to check, if at least one DataFrame is not empty.
723-
keys_to_use = keys_to_use if keys_to_use != [] else keys
724-
axis = 0 if isinstance(obj, ABCSeries) else 1
725-
result = concat({k: results[k] for k in keys_to_use}, axis=axis)
726-
elif any(is_ndframe):
727-
# There is a mix of NDFrames and scalars
728-
raise ValueError(
729-
"cannot perform both aggregation "
730-
"and transformation operations "
731-
"simultaneously"
732-
)
733-
else:
734-
from pandas import Series
735-
736-
# we have a dict of scalars
737-
# GH 36212 use name only if obj is a series
738-
if obj.ndim == 1:
739-
obj = cast("Series", obj)
740-
name = obj.name
741-
else:
742-
name = None
743-
744-
result = Series(results, name=name)
745-
746-
return result

0 commit comments

Comments
 (0)