Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
ts = pd.Series([pd.Timestamp('2000-01-01'),
pd.Timestamp('2000-01-02'),
pd.Timestamp('2000-01-02'),
pd.Timestamp('2000-01-03')])
bins = pd.interval_range(ts[0], ts[3])
# 1) `IntervalIndex` with `pd.Series`:
pd.cut(ts, bins)
Output:
0 NaN
1 NaN
2 NaN
3 NaN
dtype: category
Categories (2, interval[datetime64[ns], right]): [(2000-01-01, 2000-01-02] < (2000-01-02, 2000-01-03]]
# 2) `IntervalIndex` with `np.ndarray`:
pd.cut(ts.to_numpy(), bins)
Output:
[NaN, NaN, NaN, NaN]
Categories (2, interval[datetime64[ns], right]): [(2000-01-01, 2000-01-02] < (2000-01-02, 2000-01-03]]
# 3) `DatetimeIndex` with list:
dt_indices = pd.date_range(ts[0], ts[3])
pd.cut(ts.to_list(), dt_indices) # <-- produces TypeError
TypeError:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/Users/Vladimir/nb.ipynb Cell 30 in <cell line: 1>()
----> 1 pd.cut(ts.to_list(), dt_indices)
File ~/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pandas/core/reshape/tile.py:292, in cut(x, bins, right, labels, retbins, precision, include_lowest, duplicates, ordered)
289 if (np.diff(bins.astype("float64")) < 0).any():
290 raise ValueError("bins must increase monotonically.")
--> 292 fac, bins = _bins_to_cuts(
293 x,
294 bins,
295 right=right,
296 labels=labels,
297 precision=precision,
298 include_lowest=include_lowest,
299 dtype=dtype,
300 duplicates=duplicates,
301 ordered=ordered,
302 )
304 return _postprocess_for_cut(fac, bins, retbins, dtype, original)
File ~/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pandas/core/reshape/tile.py:427, in _bins_to_cuts(x, bins, right, labels, precision, include_lowest, dtype, duplicates, ordered)
424 bins = unique_bins
426 side: Literal["left", "right"] = "left" if right else "right"
--> 427 ids = ensure_platform_int(bins.searchsorted(x, side=side))
429 if include_lowest:
430 ids[np.asarray(x) == bins[0]] = 1
TypeError: '<' not supported between instances of 'int' and 'Timestamp'
Issue Description
- If
bins
is anIntervalIndex
:pd.cut
works as expected whenx
is a list of Timestamps, but producesNaN
s when it is apd.Series
ornp.ndarray
. - If
bins
is aDatetimeIndex
:pd.cut
works as expected whenx
is apd.Series
ornp.ndarray
of Timestamps, but raises aTypeError
when it is a list.
pd.cut
documentation clearly states that x
can be any 1-dimensional array-like.
Examples that work as expected:
pd.cut(ts.to_list(), bins)
pd.cut(ts, dt_indices)
pd.cut(ts.to_numpy(), dt_indices)
This issue is a question on Stack Overflow.
Expected Behavior
[NaN, (2000-01-01, 2000-01-02], (2000-01-01, 2000-01-02], (2000-01-02, 2000-01-03]]
Installed Versions
INSTALLED VERSIONS
commit : e8093ba
python : 3.10.0.final.0
python-bits : 64
OS : Darwin
OS-release : 20.6.0
Version : Darwin Kernel Version 20.6.0: Tue Jun 21 20:50:28 PDT 2022; root:xnu-7195.141.32~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8
pandas : 1.4.3
numpy : 1.23.2
pytz : 2022.1
dateutil : 2.8.2
setuptools : 61.2.0
pip : 22.2.2
Cython : 0.29.30
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.8.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.1
IPython : 8.2.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
markupsafe : 2.1.1
matplotlib : 3.5.1
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.8.0
snappy : None
sqlalchemy : None
tables : None
tabulate : 0.8.10
xarray : None
xlrd : None
xlwt : None
zstandard : None