Skip to content

Functions that rely on hash tables are incorrect for complex numbers #17927

Closed
@jschendel

Description

@jschendel

It looks like various functions that rely on hash tables fail when complex numbers are present, as it appears they only use the real component.

Consider the following list of values:

In [2]: x = [0, 1j, 1, 1+1j, 1+2j]

In [3]: x
Out[3]: [0, 1j, 1, (1+1j), (1+2j)]

Using value_counts:

In [4]: pd.value_counts(x)
Out[4]:
(1+0j)    3
0j        2
dtype: int64

Using unique:

In [5]: pd.unique(x)
Out[5]: array([ 0.,  1.])

Using duplicated:

In [6]: pd.Series(x).duplicated()
Out[6]:
0    False
1     True
2    False
3     True
4     True
dtype: bool

Using isin:

In [7]: pd.Series(x).isin([1j, 1+1j, 1+2j])
Out[7]:
0    False
1    False
2    False
3    False
4    False
dtype: bool

In [8]: pd.Series(x).isin([0, 1])
Out[8]:
0    True
1    True
2    True
3    True
4    True
dtype: bool

Using factorize fails as described in #16399, and multiple other functions fail in similar ways (rank, nunique, mode, etc.).

Note that these appear to work if the dtype is explicitly set to object instead of complex64:

In [9]: pd.Series(x, dtype=object).isin([0, 1])
Out[9]:
0     True
1    False
2     True
3    False
4    False
dtype: bool

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 51c5f4d
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.21.0rc1+26.g51c5f4d
pytest: 3.1.2
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.1
scipy: 0.19.1
pyarrow: 0.6.0
xarray: 0.9.6
IPython: 6.1.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.2
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: None
html5lib: 0.999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: 0.1.0
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions