Skip to content

DataFrame.__setitem__ converts Index to RangeIndex for length-zero value #22060

Closed
@s-wakaba

Description

@s-wakaba

Code Sample

#!/usr/bin/env python3
import pandas as pd
from datetime import datetime

a = pd.DataFrame([[datetime.now(), 1234, 3.1415]], columns=['col0', 'col1', 'col2']).iloc[[]]
print(a.shape)
# (0, 3) <- 0 rows, 3 columns
print(a.dtypes)
# col0    datetime64[ns]
# col1             int64
# col2           float64
# dtype: object

b = a.set_index(['col0'])
print(b.reset_index().dtypes)
# col0    datetime64[ns] <- all preserved
# col1             int64
# col2           float64
# dtype: object
b['col3'] = []
print(b.reset_index().dtypes)
# index      int64 <- column name is lost and dtype is changed to int64
# col1       int64
# col2     float64
# col3     float64
# dtype: object

c = a.set_index(['col0', 'col1'])
print(c.reset_index().dtypes)
# col0    float64 <- column names are preserved but dtypes are changed to float64
# col1    float64 <-
# col2    float64
# dtype: object
c['col3'] = []
print(c.reset_index().dtypes)
# col0    float64 <- column names are still preserved
# col1    float64 <-
# col2    float64
# col3    float64
# dtype: object

Problem description

When some operations for DataFrames with zero-rows are executed, various information of their indice are lost. Furthermore types and triggers of lost information are not inconsistent between MultiIndex and normal Index.

In case of DataFrame with non-MultiIndex, both dtype and x.index.name are lost on appending new column by substitution of empty list object.
In case of having MultiIndex, dtypes are lost just on calling x.set_index([x, y,...]). However x.index.names are preserved on appending new column.

Expected Output

In my opinion, there are little bad effect if all dtype(s) and name(s) are preserved on any these example cases. and it's consistent with cases of operation for non-zero-rows DataFrame.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-862.3.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Dtype ConversionsUnexpected or buggy dtype conversionsIndexingRelated to indexing on series/frames, not to indexes themselvesNeeds TestsUnit test(s) needed to prevent regressionsgood first issue

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions