Skip to content

BUG: Missing keys using aggregation dictionary that are unsortable raise TypeError instead of SpecificationError #39025

Closed
@simonjayhawkins

Description

@simonjayhawkins
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

>>> import numpy as np
>>> import pandas as pd
>>>
>>> pd.__version__
'1.3.0.dev0+358.g68db2d26dd'
>>>
>>> df = pd.DataFrame(
...     np.random.randn(1000, 3),
...     index=pd.date_range("1/1/2012", freq="S", periods=1000),
...     columns=[1, "foo", None],
... )
>>> r = df.resample("3T")
>>> r.agg({1: "mean", "foo": "sum"})
                            1        foo
2012-01-01 00:00:00 -0.070340   9.142790
2012-01-01 00:03:00 -0.048538  31.041777
2012-01-01 00:06:00 -0.058046  -0.394660
2012-01-01 00:09:00  0.014268  -8.947485
2012-01-01 00:12:00  0.040080  -6.869409
2012-01-01 00:15:00  0.020587   0.225159
>>>
>>> r.agg({2: "mean", "bar": "sum"})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\pandas\pandas\core\resample.py", line 298, in aggregate
    result, how = aggregate(self, func, *args, **kwargs)
  File "C:\Users\simon\pandas\pandas\core\aggregation.py", line 566, in aggregate
    return agg_dict_like(obj, arg, _axis), True
  File "C:\Users\simon\pandas\pandas\core\aggregation.py", line 741, in agg_dict_like
    cols = sorted(set(keys) - set(selected_obj.columns.intersection(keys)))
TypeError: '<' not supported between instances of 'str' and 'int'
>>>
>>> df = pd.DataFrame(
...     np.random.randn(1000, 3),
...     index=pd.date_range("1/1/2012", freq="S", periods=1000),
...     columns=["A", "B", "C"],
... )
>>>
>>> r = df.resample("3T")
>>> r.agg({"r1": "mean", "r2": "sum"})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\pandas\pandas\core\resample.py", line 298, in aggregate
    result, how = aggregate(self, func, *args, **kwargs)
  File "C:\Users\simon\pandas\pandas\core\aggregation.py", line 566, in aggregate
    return agg_dict_like(obj, arg, _axis), True
  File "C:\Users\simon\pandas\pandas\core\aggregation.py", line 742, in agg_dict_like
    raise SpecificationError(f"Column(s) {cols} do not exist")
pandas.core.base.SpecificationError: Column(s) ['r1', 'r2'] do not exist
>>>

code sample based on test_agg_consistency in pandas/tests/resample/test_resample_api.py

Problem description

with mypy 0.790

pandas/core/aggregation.py:741: error: Value of type variable "_LT" of "sorted" cannot be "Optional[Hashable]" [type-var]

Expected Output

pandas.core.base.SpecificationError: Column(s) [2, 'bar'] do not exist

Output of pd.show_versions()

[paste the output of pd.show_versions() here leaving a blank line after the details tag]

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugError ReportingIncorrect or improved errors from pandasRegressionFunctionality that used to work in a prior pandas versionResampleresample method

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions