Skip to content

ERR: too strict validation on groupby.rolling with time-aware freq #15130

Closed
@jreback

Description

@jreback

http://stackoverflow.com/questions/41642320/efficient-pandas-rolling-aggregation-over-date-range-by-group-python-2-7-windo/41643179?noredirect=1#comment70486923_41643179

In [1]: data = [
   ...: ['David', '1/1/2015', 100], ['David', '1/5/2015', 500], ['David', '5/30/2015', 50], ['David', '7/25/2015', 50],
   ...: ['Ryan', '1/4/2014', 100], ['Ryan', '1/19/2015', 500], ['Ryan', '3/31/2016', 50],
   ...: ['Joe', '7/1/2015', 100], ['Joe', '9/9/2015', 500], ['Joe', '10/15/2015', 50]
   ...: ]
   ...: 
   ...: list_of_vals = []
   ...: 
   ...: dates_df = pd.DataFrame(data=data, columns=['name', 'date', 'amount'], index=None)
   ...: dates_df['date'] = pd.to_datetime(dates_df['date'])
   ...: 

This check doesn't need to occur when we are grouping

In [7]: dates_df.groupby('name').rolling('180D', on='date')['amount'].sum()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-8896cb99a66a> in <module>()
----> 1 dates_df.groupby('name').rolling('180D', on='date')['amount'].sum()

/Users/jreback/pandas/pandas/core/groupby.py in rolling(self, *args, **kwargs)
   1148         """
   1149         from pandas.core.window import RollingGroupby
-> 1150         return RollingGroupby(self, *args, **kwargs)
   1151 
   1152     @Substitution(name='groupby')

/Users/jreback/pandas/pandas/core/window.py in __init__(self, obj, *args, **kwargs)
    635         self._groupby.mutated = True
    636         self._groupby.grouper.mutated = True
--> 637         super(GroupByMixin, self).__init__(obj, *args, **kwargs)
    638 
    639     count = GroupByMixin._dispatch('count')

/Users/jreback/pandas/pandas/core/window.py in __init__(self, obj, window, min_periods, freq, center, win_type, axis, on, **kwargs)
     76         self.win_type = win_type
     77         self.axis = obj._get_axis_number(axis) if axis is not None else None
---> 78         self.validate()
     79 
     80     @property

/Users/jreback/pandas/pandas/core/window.py in validate(self)
   1030                 formatted = self.on or 'index'
   1031                 raise ValueError("{0} must be "
-> 1032                                  "monotonic".format(formatted))
   1033 
   1034             from pandas.tseries.frequencies import to_offset

ValueError: date must be monotonic

This is ok

In [9]: dates_df.groupby('name').apply(lambda x: x.rolling('180D', on='date')['amount'].sum())
Out[9]: 
name    
David  0    100.0
       1    600.0
       2    650.0
       3    100.0
Joe    7    100.0
       8    600.0
       9    650.0
Ryan   4    100.0
       5    500.0
       6     50.0
Name: amount, dtype: float64

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugError ReportingIncorrect or improved errors from pandasGroupbyReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions