Skip to content

Calling pandas.cut with timedelta series and incompatible bins should raise TypeError #20605

Closed
@nmusolino

Description

@nmusolino

Code Sample

In [1]: import pandas

In [3]: import numpy

In [10]: s = pandas.Series(numpy.timedelta64(i, 's') for i in range(5))

In [11]: s
Out[11]:
0   00:00:00
1   00:00:01
2   00:00:02
3   00:00:03
4   00:00:04
dtype: timedelta64[ns]

In [13]: pandas.cut(s, bins=[0, 2, 5])
Out[13]:
0    NaN
1    NaN
2    NaN
3    NaN
4    NaN
dtype: category
Categories (2, object): [(0, 2] < (2, 5]]

In [16]: pandas.cut(s, bins=[0.0, 2.5, 5.0])    # In contrast, the floating-point case raises.
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-9dc1028f6406> in <module>()
----> 1 pandas.cut(s, bins=[0.0, 2.5, 5.0])    # In contrast, this raises.

C:\...\lib\site-packages\pandas\tools\tile.py in cut(x, bins, right, labels, retbins, precision, include_lowest)
    117     return _bins_to_cuts(x, bins, right=right, labels=labels,
    118                          retbins=retbins, precision=precision,
--> 119                          include_lowest=include_lowest)
    120
    121

C:\...\lib\site-packages\pandas\tools\tile.py in _bins_to_cuts(x, bins, right, labels, retbins, precision, name, include_lowest)
    189
    190     side = 'left' if right else 'right'
--> 191     ids = bins.searchsorted(x, side=side)
    192
    193     if len(algos.unique(bins)) < len(bins):

TypeError: invalid type promotion

Problem description

Calling pandas.cut with a timedelta64 series and integer bins returns an all-NaN series. This is inconsistent with two other results:

  1. Calling the function with float bins raises a TypeError as expected.
  2. Performing arithmetic comparisons with such a series (like s < 0) raises TypeError as expected.

Expected Output

Calling pandas.cut(s, bins=[0, 2, 5]) with the series s described above should raise a TypeError, because the bin edges are not of type that is comparable with the series values.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.4.5.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.24.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.8.2
IPython: 5.1.0
sphinx: 1.4.8
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.7
blosc: 1.5.0
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 2.0.0
openpyxl: 2.4.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.3
html5lib: 0.999
httplib2: 0.9.2
apiclient: None
sqlalchemy: 1.1.3
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: 2.43.0
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffDuplicate ReportDuplicate issue or pull requestTimedeltaTimedelta data type

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions