Skip to content

Commit a5ef0c3

Browse files
Merge branch 'master' into GH36666
2 parents 89af5bb + 2f55283 commit a5ef0c3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+1554
-1168
lines changed

.pre-commit-config.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,11 @@ repos:
6262
|math|module|note|raw|seealso|toctree|versionadded
6363
|versionchanged|warning):[^:]
6464
files: \.(py|pyx|rst)$
65+
- id: incorrect-code-directives
66+
name: Check for incorrect code block or IPython directives
67+
language: pygrep
68+
entry: (\.\. code-block ::|\.\. ipython ::)
69+
files: \.(py|pyx|rst)$
6570
- repo: https://github.com/asottile/yesqa
6671
rev: v1.2.2
6772
hooks:

ci/code_checks.sh

Lines changed: 0 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -207,18 +207,6 @@ if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then
207207
invgrep -r -E --include '*.py' '(unittest(\.| import )mock|mock\.Mock\(\)|mock\.patch)' pandas/tests/
208208
RET=$(($RET + $?)) ; echo $MSG "DONE"
209209

210-
MSG='Check for wrong space after code-block directive and before colon (".. code-block ::" instead of ".. code-block::")' ; echo $MSG
211-
invgrep -R --include="*.rst" ".. code-block ::" doc/source
212-
RET=$(($RET + $?)) ; echo $MSG "DONE"
213-
214-
MSG='Check for wrong space after ipython directive and before colon (".. ipython ::" instead of ".. ipython::")' ; echo $MSG
215-
invgrep -R --include="*.rst" ".. ipython ::" doc/source
216-
RET=$(($RET + $?)) ; echo $MSG "DONE"
217-
218-
MSG='Check for extra blank lines after the class definition' ; echo $MSG
219-
invgrep -R --include="*.py" --include="*.pyx" -E 'class.*:\n\n( )+"""' .
220-
RET=$(($RET + $?)) ; echo $MSG "DONE"
221-
222210
MSG='Check for use of {foo!r} instead of {repr(foo)}' ; echo $MSG
223211
invgrep -R --include=*.{py,pyx} '!r}' pandas
224212
RET=$(($RET + $?)) ; echo $MSG "DONE"
@@ -243,12 +231,6 @@ if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then
243231
invgrep -R --include=*.{py,pyx} '\.__class__' pandas
244232
RET=$(($RET + $?)) ; echo $MSG "DONE"
245233

246-
MSG='Check that no file in the repo contains trailing whitespaces' ; echo $MSG
247-
INVGREP_APPEND=" <- trailing whitespaces found"
248-
invgrep -RI --exclude=\*.{svg,c,cpp,html,js} --exclude-dir=env "\s$" *
249-
RET=$(($RET + $?)) ; echo $MSG "DONE"
250-
unset INVGREP_APPEND
251-
252234
MSG='Check code for instances of os.remove' ; echo $MSG
253235
invgrep -R --include="*.py*" --exclude "common.py" --exclude "test_writers.py" --exclude "test_store.py" -E "os\.remove" pandas/tests/
254236
RET=$(($RET + $?)) ; echo $MSG "DONE"

doc/source/ecosystem.rst

Lines changed: 18 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,7 @@ allows users to view, manipulate and edit pandas ``Index``, ``Series``,
230230
and ``DataFrame`` objects like a "spreadsheet", including copying and modifying
231231
values, sorting, displaying a "heatmap", converting data types and more.
232232
pandas objects can also be renamed, duplicated, new columns added,
233-
copyed/pasted to/from the clipboard (as TSV), and saved/loaded to/from a file.
233+
copied/pasted to/from the clipboard (as TSV), and saved/loaded to/from a file.
234234
Spyder can also import data from a variety of plain text and binary files
235235
or the clipboard into a new pandas DataFrame via a sophisticated import wizard.
236236

@@ -376,6 +376,23 @@ Dask-ML enables parallel and distributed machine learning using Dask alongside e
376376

377377
Koalas provides a familiar pandas DataFrame interface on top of Apache Spark. It enables users to leverage multi-cores on one machine or a cluster of machines to speed up or scale their DataFrame code.
378378

379+
`Modin <https://github.com/modin-project/modin>`__
380+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
381+
382+
The ``modin.pandas`` DataFrame is a parallel and distributed drop-in replacement
383+
for pandas. This means that you can use Modin with existing pandas code or write
384+
new code with the existing pandas API. Modin can leverage your entire machine or
385+
cluster to speed up and scale your pandas workloads, including traditionally
386+
time-consuming tasks like ingesting data (``read_csv``, ``read_excel``,
387+
``read_parquet``, etc.).
388+
389+
.. code:: python
390+
391+
# import pandas as pd
392+
import modin.pandas as pd
393+
394+
df = pd.read_csv("big.csv") # use all your cores!
395+
379396
`Odo <http://odo.pydata.org>`__
380397
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
381398

@@ -400,16 +417,6 @@ If also displays progress bars.
400417
# df.apply(func)
401418
df.parallel_apply(func)
402419
403-
`Ray <https://ray.readthedocs.io/en/latest/pandas_on_ray.html>`__
404-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
405-
406-
pandas on Ray is an early stage DataFrame library that wraps pandas and transparently distributes the data and computation. The user does not need to know how many cores their system has, nor do they need to specify how to distribute the data. In fact, users can continue using their previous pandas notebooks while experiencing a considerable speedup from pandas on Ray, even on a single machine. Only a modification of the import statement is needed, as we demonstrate below. Once you’ve changed your import statement, you’re ready to use pandas on Ray just like you would pandas.
407-
408-
.. code:: python
409-
410-
# import pandas as pd
411-
import ray.dataframe as pd
412-
413420
414421
`Vaex <https://docs.vaex.io/>`__
415422
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

doc/source/whatsnew/v0.16.2.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ Bug fixes
147147
- Bug in ``setitem`` where type promotion is applied to the entire block (:issue:`10280`)
148148
- Bug in ``Series`` arithmetic methods may incorrectly hold names (:issue:`10068`)
149149
- Bug in ``GroupBy.get_group`` when grouping on multiple keys, one of which is categorical. (:issue:`10132`)
150-
- Bug in ``DatetimeIndex`` and ``TimedeltaIndex`` names are lost after timedelta arithmetics ( :issue:`9926`)
150+
- Bug in ``DatetimeIndex`` and ``TimedeltaIndex`` names are lost after timedelta arithmetic ( :issue:`9926`)
151151
- Bug in ``DataFrame`` construction from nested ``dict`` with ``datetime64`` (:issue:`10160`)
152152
- Bug in ``Series`` construction from ``dict`` with ``datetime64`` keys (:issue:`9456`)
153153
- Bug in ``Series.plot(label="LABEL")`` not correctly setting the label (:issue:`10119`)

doc/source/whatsnew/v0.24.1.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0241:
22

3-
Whats new in 0.24.1 (February 3, 2019)
4-
--------------------------------------
3+
What's new in 0.24.1 (February 3, 2019)
4+
---------------------------------------
55

66
.. warning::
77

doc/source/whatsnew/v0.24.2.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0242:
22

3-
Whats new in 0.24.2 (March 12, 2019)
4-
------------------------------------
3+
What's new in 0.24.2 (March 12, 2019)
4+
-------------------------------------
55

66
.. warning::
77

doc/source/whatsnew/v1.1.4.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ Fixed regressions
2020
- Fixed regression in :class:`RollingGroupby` with ``sort=False`` not being respected (:issue:`36889`)
2121
- Fixed regression in :meth:`Series.astype` converting ``None`` to ``"nan"`` when casting to string (:issue:`36904`)
2222
- Fixed regression in :class:`RollingGroupby` causing a segmentation fault with Index of dtype object (:issue:`36727`)
23+
- Fixed regression in :meth:`DataFrame.resample(...).apply(...)` raised ``AttributeError`` when input was a :class:`DataFrame` and only a :class:`Series` was evaluated (:issue:`36951`)
24+
- Fixed regression in :class:`PeriodDtype` comparing both equal and unequal to its string representation (:issue:`37265`)
2325

2426
.. ---------------------------------------------------------------------------
2527
@@ -30,6 +32,7 @@ Bug fixes
3032
- Bug causing ``groupby(...).sum()`` and similar to not preserve metadata (:issue:`29442`)
3133
- Bug in :meth:`Series.isin` and :meth:`DataFrame.isin` raising a ``ValueError`` when the target was read-only (:issue:`37174`)
3234
- Bug in :meth:`GroupBy.fillna` that introduced a performance regression after 1.0.5 (:issue:`36757`)
35+
- Bug in :meth:`DataFrame.info` was raising a ``KeyError`` when the DataFrame has integer column names (:issue:`37245`)
3336

3437
.. ---------------------------------------------------------------------------
3538

doc/source/whatsnew/v1.2.0.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,7 @@ Alternatively, you can also use the dtype object:
180180
.. warning::
181181

182182
Experimental: the new floating data types are currently experimental, and its
183-
behaviour or API may still change without warning. Expecially the behaviour
183+
behaviour or API may still change without warning. Especially the behaviour
184184
regarding NaN (distinct from NA missing values) is subject to change.
185185

186186
.. _whatsnew_120.index_name_preservation:
@@ -524,7 +524,8 @@ Other
524524

525525
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` incorrectly raising ``AssertionError`` instead of ``ValueError`` when invalid parameter combinations are passed (:issue:`36045`)
526526
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` with numeric values and string ``to_replace`` (:issue:`34789`)
527-
- Fixed metadata propagation in the :class:`Series.dt` and :class:`Series.str` accessors (:issue:`28283`)
527+
- Fixed bug in metadata propagation incorrectly copying DataFrame columns as metadata when the column name overlaps with the metadata name (:issue:`37037`)
528+
- Fixed metadata propagation in the :class:`Series.dt` and :class:`Series.str` accessors and :class:`DataFrame.duplicated` and ::class:`DataFrame.stack` methods (:issue:`28283`)
528529
- Bug in :meth:`Index.union` behaving differently depending on whether operand is a :class:`Index` or other list-like (:issue:`36384`)
529530
- Passing an array with 2 or more dimensions to the :class:`Series` constructor now raises the more specific ``ValueError``, from a bare ``Exception`` previously (:issue:`35744`)
530531

pandas/_testing.py

Lines changed: 46 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
import gzip
77
import operator
88
import os
9+
import re
910
from shutil import rmtree
1011
import string
1112
import tempfile
@@ -2546,10 +2547,11 @@ def wrapper(*args, **kwargs):
25462547

25472548
@contextmanager
25482549
def assert_produces_warning(
2549-
expected_warning=Warning,
2550+
expected_warning: Optional[Union[Type[Warning], bool]] = Warning,
25502551
filter_level="always",
2551-
check_stacklevel=True,
2552-
raise_on_extra_warnings=True,
2552+
check_stacklevel: bool = True,
2553+
raise_on_extra_warnings: bool = True,
2554+
match: Optional[str] = None,
25532555
):
25542556
"""
25552557
Context manager for running code expected to either raise a specific
@@ -2584,6 +2586,8 @@ class for all warnings. To check that no warning is returned,
25842586
raise_on_extra_warnings : bool, default True
25852587
Whether extra warnings not of the type `expected_warning` should
25862588
cause the test to fail.
2589+
match : str, optional
2590+
Match warning message.
25872591
25882592
Examples
25892593
--------
@@ -2610,28 +2614,28 @@ class for all warnings. To check that no warning is returned,
26102614
with warnings.catch_warnings(record=True) as w:
26112615

26122616
saw_warning = False
2617+
matched_message = False
2618+
26132619
warnings.simplefilter(filter_level)
26142620
yield w
26152621
extra_warnings = []
26162622

26172623
for actual_warning in w:
2618-
if expected_warning and issubclass(
2619-
actual_warning.category, expected_warning
2620-
):
2624+
if not expected_warning:
2625+
continue
2626+
2627+
expected_warning = cast(Type[Warning], expected_warning)
2628+
if issubclass(actual_warning.category, expected_warning):
26212629
saw_warning = True
26222630

26232631
if check_stacklevel and issubclass(
26242632
actual_warning.category, (FutureWarning, DeprecationWarning)
26252633
):
2626-
from inspect import getframeinfo, stack
2634+
_assert_raised_with_correct_stacklevel(actual_warning)
2635+
2636+
if match is not None and re.search(match, str(actual_warning.message)):
2637+
matched_message = True
26272638

2628-
caller = getframeinfo(stack()[2][0])
2629-
msg = (
2630-
"Warning not set with correct stacklevel. "
2631-
f"File where warning is raised: {actual_warning.filename} != "
2632-
f"{caller.filename}. Warning message: {actual_warning.message}"
2633-
)
2634-
assert actual_warning.filename == caller.filename, msg
26352639
else:
26362640
extra_warnings.append(
26372641
(
@@ -2641,18 +2645,41 @@ class for all warnings. To check that no warning is returned,
26412645
actual_warning.lineno,
26422646
)
26432647
)
2648+
26442649
if expected_warning:
2645-
msg = (
2646-
f"Did not see expected warning of class "
2647-
f"{repr(expected_warning.__name__)}"
2648-
)
2649-
assert saw_warning, msg
2650+
expected_warning = cast(Type[Warning], expected_warning)
2651+
if not saw_warning:
2652+
raise AssertionError(
2653+
f"Did not see expected warning of class "
2654+
f"{repr(expected_warning.__name__)}"
2655+
)
2656+
2657+
if match and not matched_message:
2658+
raise AssertionError(
2659+
f"Did not see warning {repr(expected_warning.__name__)} "
2660+
f"matching {match}"
2661+
)
2662+
26502663
if raise_on_extra_warnings and extra_warnings:
26512664
raise AssertionError(
26522665
f"Caused unexpected warning(s): {repr(extra_warnings)}"
26532666
)
26542667

26552668

2669+
def _assert_raised_with_correct_stacklevel(
2670+
actual_warning: warnings.WarningMessage,
2671+
) -> None:
2672+
from inspect import getframeinfo, stack
2673+
2674+
caller = getframeinfo(stack()[3][0])
2675+
msg = (
2676+
"Warning not set with correct stacklevel. "
2677+
f"File where warning is raised: {actual_warning.filename} != "
2678+
f"{caller.filename}. Warning message: {actual_warning.message}"
2679+
)
2680+
assert actual_warning.filename == caller.filename, msg
2681+
2682+
26562683
class RNGContext:
26572684
"""
26582685
Context manager to set the numpy random number generator speed. Returns

pandas/conftest.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -361,6 +361,19 @@ def multiindex_year_month_day_dataframe_random_data():
361361
return ymd
362362

363363

364+
@pytest.fixture
365+
def multiindex_dataframe_random_data():
366+
"""DataFrame with 2 level MultiIndex with random data"""
367+
index = MultiIndex(
368+
levels=[["foo", "bar", "baz", "qux"], ["one", "two", "three"]],
369+
codes=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
370+
names=["first", "second"],
371+
)
372+
return DataFrame(
373+
np.random.randn(10, 3), index=index, columns=Index(["A", "B", "C"], name="exp")
374+
)
375+
376+
364377
def _create_multiindex():
365378
"""
366379
MultiIndex used to test the general functionality of this object

pandas/core/arrays/base.py

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -507,7 +507,12 @@ def _values_for_argsort(self) -> np.ndarray:
507507
return np.array(self)
508508

509509
def argsort(
510-
self, ascending: bool = True, kind: str = "quicksort", *args, **kwargs
510+
self,
511+
ascending: bool = True,
512+
kind: str = "quicksort",
513+
na_position: str = "last",
514+
*args,
515+
**kwargs,
511516
) -> np.ndarray:
512517
"""
513518
Return the indices that would sort this array.
@@ -538,7 +543,14 @@ def argsort(
538543
# 2. argsort : total control over sorting.
539544
ascending = nv.validate_argsort_with_ascending(ascending, args, kwargs)
540545

541-
result = nargsort(self, kind=kind, ascending=ascending, na_position="last")
546+
values = self._values_for_argsort()
547+
result = nargsort(
548+
values,
549+
kind=kind,
550+
ascending=ascending,
551+
na_position=na_position,
552+
mask=np.asarray(self.isna()),
553+
)
542554
return result
543555

544556
def argmin(self):

pandas/core/arrays/datetimelike.py

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -444,7 +444,7 @@ def _validate_comparison_value(self, other, opname: str):
444444

445445
else:
446446
try:
447-
other = self._validate_listlike(other, opname, allow_object=True)
447+
other = self._validate_listlike(other, allow_object=True)
448448
self._check_compatible_with(other)
449449
except TypeError as err:
450450
if is_object_dtype(getattr(other, "dtype", None)):
@@ -548,7 +548,7 @@ def _validate_scalar(self, value, msg: Optional[str] = None):
548548

549549
return value
550550

551-
def _validate_listlike(self, value, opname: str, allow_object: bool = False):
551+
def _validate_listlike(self, value, allow_object: bool = False):
552552
if isinstance(value, type(self)):
553553
return value
554554

@@ -578,18 +578,17 @@ def _validate_listlike(self, value, opname: str, allow_object: bool = False):
578578

579579
elif not type(self)._is_recognized_dtype(value.dtype):
580580
raise TypeError(
581-
f"{opname} requires compatible dtype or scalar, "
582-
f"not {type(value).__name__}"
581+
f"value should be a '{self._scalar_type.__name__}', 'NaT', "
582+
f"or array of those. Got '{type(value).__name__}' instead."
583583
)
584-
585584
return value
586585

587586
def _validate_searchsorted_value(self, value):
588587
msg = "searchsorted requires compatible dtype or scalar"
589588
if not is_list_like(value):
590589
value = self._validate_scalar(value, msg)
591590
else:
592-
value = self._validate_listlike(value, "searchsorted")
591+
value = self._validate_listlike(value)
593592

594593
rv = self._unbox(value)
595594
return self._rebox_native(rv)
@@ -600,7 +599,7 @@ def _validate_setitem_value(self, value):
600599
f"or array of those. Got '{type(value).__name__}' instead."
601600
)
602601
if is_list_like(value):
603-
value = self._validate_listlike(value, "setitem")
602+
value = self._validate_listlike(value)
604603
else:
605604
value = self._validate_scalar(value, msg)
606605

@@ -622,7 +621,7 @@ def _validate_where_value(self, other):
622621
if not is_list_like(other):
623622
other = self._validate_scalar(other, msg)
624623
else:
625-
other = self._validate_listlike(other, "where")
624+
other = self._validate_listlike(other)
626625

627626
return self._unbox(other, setitem=True)
628627

pandas/core/dtypes/dtypes.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -907,6 +907,9 @@ def __eq__(self, other: Any) -> bool:
907907

908908
return isinstance(other, PeriodDtype) and self.freq == other.freq
909909

910+
def __ne__(self, other: Any) -> bool:
911+
return not self.__eq__(other)
912+
910913
def __setstate__(self, state):
911914
# for pickle compat. __getstate__ is defined in the
912915
# PandasExtensionDtype superclass and uses the public properties to

0 commit comments

Comments
 (0)