Skip to content

ERR/PERF: df.idxmax/min not friendly with DataFrames and mixed dtypes #15882

Closed
@jreback

Description

@jreback

This is not friendly (nor is this efficient in the slightest as it coerces all the data via .values). This needs to operate block-by-block (or column-by-column is prob fine).

from SO

In [11]: df = pd.DataFrame({'Date':['20/03/17 10:30:34','20/03/17 10:31:24','20/03/17 10:34:34'],
    ...:                    'Value':[4,7,5]})
    ...: 
    ...: df['Date'] = pd.to_datetime(df.Date)
    ...: df
    ...: 
Out[11]: 
                 Date  Value
0 2017-03-20 10:30:34      4
1 2017-03-20 10:31:24      7
2 2017-03-20 10:34:34      5
In [13]: df.idxmax()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-0c78e75af756> in <module>()
----> 1 df.idxmax()

/Users/jreback/pandas/pandas/core/frame.py in idxmax(self, axis, skipna)
   5112         """
   5113         axis = self._get_axis_number(axis)
-> 5114         indices = nanops.nanargmax(self.values, axis=axis, skipna=skipna)
   5115         index = self._get_axis(axis)
   5116         result = [index[i] if i >= 0 else NA for i in indices]

/Users/jreback/pandas/pandas/core/nanops.py in nanargmax(values, axis, skipna)
    462     """
    463     values, mask, dtype, _ = _get_values(values, skipna, fill_value_typ='-inf',
--> 464                                          isfinite=True)
    465     result = values.argmax(axis)
    466     result = _maybe_arg_null_out(result, axis, mask, skipna)

/Users/jreback/pandas/pandas/core/nanops.py in _get_values(values, skipna, fill_value, fill_value_typ, isfinite, copy)
    181     values = _values_from_object(values)
    182     if isfinite:
--> 183         mask = _isfinite(values)
    184     else:
    185         mask = isnull(values)

/Users/jreback/pandas/pandas/core/nanops.py in _isfinite(values)
    224             is_integer_dtype(values) or is_bool_dtype(values)):
    225         return ~np.isfinite(values)
--> 226     return ~np.isfinite(values.astype('float64'))
    227 
    228 

TypeError: float() argument must be a string or a number, not 'Timestamp'

fine on Series.

In [27]: df.Date.idxmax()
Out[27]: 2

In [28]: df.Value.idxmax()
Out[28]: 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Dtype ConversionsUnexpected or buggy dtype conversionsError ReportingIncorrect or improved errors from pandasIndexingRelated to indexing on series/frames, not to indexes themselvesPerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions