Description
It looks like various functions that rely on hash tables fail when complex numbers are present, as it appears they only use the real component.
Consider the following list of values:
In [2]: x = [0, 1j, 1, 1+1j, 1+2j]
In [3]: x
Out[3]: [0, 1j, 1, (1+1j), (1+2j)]
Using value_counts
:
In [4]: pd.value_counts(x)
Out[4]:
(1+0j) 3
0j 2
dtype: int64
Using unique
:
In [5]: pd.unique(x)
Out[5]: array([ 0., 1.])
Using duplicated
:
In [6]: pd.Series(x).duplicated()
Out[6]:
0 False
1 True
2 False
3 True
4 True
dtype: bool
Using isin
:
In [7]: pd.Series(x).isin([1j, 1+1j, 1+2j])
Out[7]:
0 False
1 False
2 False
3 False
4 False
dtype: bool
In [8]: pd.Series(x).isin([0, 1])
Out[8]:
0 True
1 True
2 True
3 True
4 True
dtype: bool
Using factorize
fails as described in #16399, and multiple other functions fail in similar ways (rank
, nunique
, mode
, etc.).
Note that these appear to work if the dtype
is explicitly set to object
instead of complex64
:
In [9]: pd.Series(x, dtype=object).isin([0, 1])
Out[9]:
0 True
1 False
2 True
3 False
4 False
dtype: bool
Output of pd.show_versions()
INSTALLED VERSIONS
commit: 51c5f4d
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.21.0rc1+26.g51c5f4d
pytest: 3.1.2
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.1
scipy: 0.19.1
pyarrow: 0.6.0
xarray: 0.9.6
IPython: 6.1.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.2
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: None
html5lib: 0.999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: 0.1.0
pandas_gbq: None
pandas_datareader: None