Skip to content

describe() fails with stacktrace for an empty DataFrame #2610

Closed
@jankatins

Description

@jankatins

This is from a first try to select all rows where an ID is in another dataset. The code does not work as intended (see last line), but I think that the describe() call shouldn't fail.

[column names changed in output; output from last call shortened]

In [47]: all
Out[47]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 974757 entries, 0 to 974756
Data columns:
eid                  974757  non-null values
number               974757  non-null values
a                    972510  non-null values
b                    974757  non-null values
c                    929268  non-null values
d                    922700  non-null values
e                    974757  non-null values
dtypes: int64(1), object(6)
In [48]: subset = all[all["eid"].isin(other["eid"])]
In [49]: subset
Out[49]: 

Int64Index([], dtype=int64)
Empty DataFrame


In [50]: subset.describe()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-50-ff735ef04a17> in <module>()
----> 1 business_authors.describe()

C:\portabel\Python27\lib\site-packages\pandas\core\frame.pyc in describe(self, percentile_width)
   4539             series = self[column]
   4540             destat.append([series.count(), series.mean(), series.std(),
-> 4541                            series.min(), series.quantile(lb), series.median(),
   4542                            series.quantile(ub), series.max()])
   4543 

C:\portabel\Python27\lib\site-packages\pandas\core\series.pyc in min(self, axis, out, skipna, level)
   1320         if level is not None:
   1321             return self._agg_by_level('min', level=level, skipna=skipna)
-> 1322         return nanops.nanmin(self.values, skipna=skipna)
   1323 
   1324     @Substitution(name='maximum', shortname='max',

C:\portabel\Python27\lib\site-packages\pandas\core\nanops.pyc in f(values, axis, skipna, **kwds)
     46                 result = alt(values, axis=axis, skipna=skipna, **kwds)
     47         except Exception:
---> 48             result = alt(values, axis=axis, skipna=skipna, **kwds)
     49 
     50         return result

C:\portabel\Python27\lib\site-packages\pandas\core\nanops.pyc in _nanmin(values, axis, skipna)
    179              or values.size == 0):
    180             result = values.sum(axis)
--> 181             result.fill(np.nan)
    182         else:
    183             result = values.min(axis)


ValueError: cannot convert float NaN to integer



In[51]: all["eid"].isin(other["eid"])
Out[51]: 
0     False
1     False
2     False
3     False
4     False
...
974752    False
974753    False
974754    False
974755    False
974756    False
Name: eid, Length: 974757

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions