Skip to content

BUG: 1.4.0rc1 Error vectorizing grouping aggregation on empty dataframe with object column #45231

Closed
@TheNeuralBit

Description

@TheNeuralBit

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

In [20]: df = pd.DataFrame({'group': pd.Series(dtype='object'), 'str': pd.Series(dtype='object')})                  
                                                                                                                    
In [21]: df.groupby('group').any()                        
---------------------------------------------------------------------------                                         
ValueError                                Traceback (most recent call last)                                         
<ipython-input-21-d0b9fa3e2ddd> in <module>               
----> 1 df.groupby('group').any()                                                                                   

~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in any(self, skipna)                                                                                                                       
   1804             is True within its respective group, False otherwise.                                           
   1805         """                                       
-> 1806         return self._bool_agg("any", skipna)                                                                
   1807                                                   
   1808     @final                                        

~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in _bool_agg(self, val_test, skipna)
   1774                 return result.astype(inference, copy=False)                                                 
   1775                                                   
-> 1776         return self._get_cythonized_result(                                                                 
   1777             libgroupby.group_any_all,             
   1778             numeric_only=False,                                                                             

~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in _get_cythonized_result(self, base_func, cython_dtype, numeric_only, needs_counts, needs_nullable, needs_mask, pre_processing, post_proce
ssing, **kwargs)                                          
   3383             mgr = mgr.get_numeric_data()                                                                    
   3384                                                                                                             
-> 3385         res_mgr = mgr.grouped_reduce(blk_func, ignore_failures=True)                                        
   3386                                                   
   3387         if not is_ser and len(res_mgr.items) != len(mgr.items):                                             
                                                                                                                    
~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/internals/managers.py in grouped_reduce(self, func, ignore_failures)
   1338                 for sb in blk._split():           
   1339                     try: 
-> 1340                         applied = sb.apply(func)  
   1341                     except (TypeError, NotImplementedError):                                                
   1342                         if not ignore_failures:                                                             
                                                                                                                    
~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/internals/blocks.py in apply(self, func, **kwargs)                                                                                                            
    388         one                                       
    389         """                                                                                                 
--> 390         result = func(self.values, **kwargs)      
    391                                                                                                             
    392         return self._split_op_result(result)      
                                                          
~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in blk_func(values)    
   3342             vals = values                         
   3343             if pre_processing:                    
-> 3344                 vals, inferences = pre_processing(vals)                                                     
   3345                                                   
   3346             vals = vals.astype(cython_dtype, copy=False)                                                    
                                                                                                                    
~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in objs_to_bool(vals)  
   1752                 if skipna:                        
   1753                     func = np.vectorize(lambda x: bool(x) if not isna(x) else True)                         
-> 1754                     vals = func(vals)                                                                       
   1755                 else:                             
   1756                     vals = vals.astype(bool, copy=False)                                                    

~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/numpy/lib/function_base.py in __call__(self, *args, **kwargs)                                                                                                             
   2106             vargs.extend([kwargs[_n] for _n in names])                                                      
   2107                                                   
-> 2108         return self._vectorize_call(func=func, args=vargs)                                                  
   2109                                                   
   2110     def _get_ufunc_and_otypes(self, func, args):  

~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/numpy/lib/function_base.py in _vectorize_call(self, func, args)    
   2184             res = func()                                                                                    
   2185         else:                                     
-> 2186             ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)                                
   2187                                                   
   2188             # Convert args to object arrays first                                                           

~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/numpy/lib/function_base.py in _get_ufunc_and_otypes(self, func, args)                                                                                                     
   2140             args = [asarray(arg) for arg in args] 
   2141             if builtins.any(arg.size == 0 for arg in args):                                                 
-> 2142                 raise ValueError('cannot call `vectorize` on size 0 inputs '                                
   2143                                  'unless `otypes` is set')                                                  
   2144                                                   
                                                                                                                    
ValueError: cannot call `vectorize` on size 0 inputs unless `otypes` is set

Issue Description

Some grouped aggregations raise a ValueError (from numpy vectorization code) when operating on an empty DataFrame with an object dtype column. I've only observed this in any and all (perhaps because other aggregations drop the object column by default).

I've only observed this behavior in 1.4.0rc1. I've verified this code works fine in previous pandas versions, but I haven't tested with master.

Expected Behavior

Pandas should produce an empty result, as in previous versions.

Installed Versions

1.4.0rc1

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions