Description
Index gets lost when DataFrame melt method is used
import pandas as pd
import numpy as np
df = pd.DataFrame({"Numbers_1":range(0,3),
"Numbers_2":range(3,6),
"Letters":["A","B","C"]})
df.set_index("Letters",inplace=True)
print(df)
Letters | Numbers_1 | Numbers_2 |
---|---|---|
A | 0 | 3 |
B | 1 | 4 |
C | 2 | 5 |
df_melted = df.melt()
print(df_melted)
. | variable | value |
---|---|---|
0 | Numbers_1 | 0 |
1 | Numbers_1 | 1 |
2 | Numbers_1 | 2 |
3 | Numbers_2 | 3 |
4 | Numbers_2 | 4 |
5 | Numbers_2 | 5 |
Problem description
When melting a dataframe, I expected the original index to be reused. However, the original index is lost in the melt method. This is probably meant by wesm's comment (# TODO: what about the existing index?)
pandas/pandas/core/reshape/reshape.py
Line 715 in 133a208
Expected Output
I would expect something like
n_row,n_col = df.shape
index_melted = list(df.index.get_values())*n_col
melt_id = list(np.arange(n_col).repeat(n_row))
temp = list(zip(*[index_melted,melt_id]))
index_melted_uniq = pd.MultiIndex.from_tuples(temp,names=[df.index.names[0], 'melt_id'])
index_numbers = list(range(df.shape[1]))*n_row
data = {'variable':df.columns.repeat(n_row),
"value":df.values.ravel('F')}
df_expected = pd.DataFrame(data,columns = ["variable","value"], index=index_melted_uniq)
print(df_expected)
Letters | melt_id | variable | value |
---|---|---|---|
A | 0 | Numbers_1 | 0 |
B | 0 | Numbers_1 | 1 |
C | 0 | Numbers_1 | 2 |
A | 1 | Numbers_2 | 3 |
B | 1 | Numbers_2 | 4 |
C | 1 | Numbers_2 | 5 |
Where Letters and melt_id are two multiindex levels and variable and value are actual columns.
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: d0f62c2816ada96a991f5a624a52c9a4f09617f7
python: 3.6.2.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None
pandas: 0.21.0.dev+420.gd0f62c2
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.2.2.post20170724
Cython: 0.26
numpy: 1.13.1
scipy: None
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None