Skip to content

Inconsistent handling of duplicate axes + mixed dtypes in DataFrame.where #25399

Closed
@batterseapower

Description

@batterseapower

Code Sample, a copy-pastable example if possible

result = pd.DataFrame([
    [0, np.nan]
], columns=pd.Index(['A', 'A']))

mask = pd.DataFrame([[True, True]], *result.axes)

a = result.astype(object).where(mask) # works
b = result.astype('f8').where(mask) # works
c = result.T.where(mask.T).T # works
d = result.where(mask) # fails: "cannot reindex from a duplicate axis"

Problem description

It doesn't make sense that a, b and c work but d doesn't. The dtype of a column shouldn't affect whether or not masking suceeds.

Expected Output

a.astype('f8').equals(b.astype('f8')) and b.astype('f8').equals(c.astype('f8')) and c.astype('f8').equals(d.astype('f8'))

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.24.1
pytest: 3.1.2
pip: 19.0.2
setuptools: 39.0.1
Cython: 0.27.2
numpy: 1.16.1
scipy: 1.2.1
pyarrow: 0.9.0
xarray: None
IPython: 6.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: 1.3.0.dev0
tables: 3.4.4
numexpr: 2.6.9
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: 3.8.0
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.11
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Assignees

Labels

IndexingRelated to indexing on series/frames, not to indexes themselvesNeeds TestsUnit test(s) needed to prevent regressionsTestingpandas testing functions or related to the test suitegood first issue

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions