Open
Description
Melt Enhancement
Summary: This is a proposal with a pull request to enhance melt
to simultaneously melt multiple groups of columns and to add functionality from wide_to_long
along with better MultiIndexing capabilities. See this notebook for more examples.
- Melts different groups of columns by passing a list of lists into
value_vars
. Each group gets melted into its own column. This feature replaces the need for lreshape. - When melting different groups of columns, groups do not have to be the same length. The shorter groups are filled with missing values.
- Adds parameters
stubnames
(boolean),prefix
andsep
from functionwide_to_long
. It keeps the suffixes in separate columns and does not align them in the same way. - Can select any number of MultiIndex levels and greatly increase MultiIndex functionality
- Works with repeated column names, which normally show up when selecting a subset of MultiIndex levels
- Performance is ~30-40% faster than original
melt
, slightly slower thanlreshape
and much faster thanwide_to_long
>>> df = pd.DataFrame({'City': ['Houston', 'Austin', 'Hoover'],
'State': ['Texas', 'Texas', 'Alabama'],
'Name':['Aria', 'Penelope', 'Niko'],
'Mango':[4, 10, 90],
'Orange': [10, 8, 14],
'Watermelon':[40, 99, 43],
'Gin':[16, 200, 34],
'Vodka':[20, 33, 18]},
columns=['City', 'State', 'Name', 'Mango', 'Orange', 'Watermelon', 'Gin', 'Vodka'])
City State Name Mango Orange Watermelon Gin Vodka
0 Houston Texas Aria 4 10 40 16 20
1 Austin Texas Penelope 10 8 99 200 33
2 Hoover Alabama Niko 90 14 43 34 18
Use a list of lists in value_vars
to melt the fruit and drinks
>>> df.melt(id_vars=['City', 'State'], value_vars=[['Mango', 'Orange', 'Watermelon'], ['Gin', 'Vodka']],
var_name=['Fruit', 'Drink'], value_name=['Pounds', 'Ounces'])
City State Fruit Pounds Drink Ounces
0 Houston Texas Mango 4 Gin 16.0
1 Austin Texas Mango 10 Gin 200.0
2 Hoover Alabama Mango 90 Gin 34.0
3 Houston Texas Orange 10 Vodka 20.0
4 Austin Texas Orange 8 Vodka 33.0
5 Hoover Alabama Orange 14 Vodka 18.0
6 Houston Texas Watermelon 40 nan NaN
7 Austin Texas Watermelon 99 nan NaN
8 Hoover Alabama Watermelon 43 nan NaN
wide_to_long
functionality. Added parameters stubnames
(boolean), sep
and suffix
.
>>> df1 = pd.DataFrame({'group': ['a', 'b', 'c'],
'exp_1':[4, 10, -9],
'exp_2': [10, 8, 14],
'res_1':[8, 5, 4],
'res_3':[11, 0, 7]}, columns=['group', 'exp_1', 'exp_2', 'res_1', 'res_3'])
group exp_1 exp_2 res_1 res_3
0 a 4 10 8 11
1 b 10 8 5 0
2 c -9 14 4 7
>>> df1.melt(id_vars='group', value_vars=['exp','res'], stubnames=True, sep='_')
group variable_exp exp variable_res res
0 a 1 4 1 8
1 b 1 10 1 5
2 c 1 -9 1 4
3 a 2 10 3 11
4 b 2 8 3 0
5 c 2 14 3 7
Also adds support for all kinds of multiindexing
>>> df2 = df.copy()
>>> df2.columns = pd.MultiIndex.from_arrays([list('aabbcccd'), list('ffffgggg'), df.columns],
names=[None, None, 'some vars'])
a b c d
f f g g
some vars City State Name Mango Orange Watermelon Gin Vodka
0 Houston Texas Aria 4 10 40 16 20
1 Austin Texas Penelope 10 8 99 200 33
2 Hoover Alabama Niko 90 14 43 34 18
>>> df2.melt(id_vars=[('a', 'f', 'State')],
value_vars=[[('b', 'f', 'Name'), ('c', 'g', 'Watermelon')],
[('b','f','Mango'), ('c','g', 'Orange'), ('d', 'g', 'Vodka')]],
var_name=[['myvar1', 'myvar2', 'myvar3'],
['next_myvar1', 'next_myvar2', 'next_myvar3']],
value_name=['some values', 'more_values'])
(a, f, State) myvar1 myvar2 myvar3 some values next_myvar1 next_myvar2 \
0 Texas b f Name Aria b f
1 Texas b f Name Penelope b f
2 Alabama b f Name Niko b f
3 Texas c g Watermelon 40 c g
4 Texas c g Watermelon 99 c g
5 Alabama c g Watermelon 43 c g
6 Texas nan nan nan NaN d g
7 Texas nan nan nan NaN d g
8 Alabama nan nan nan NaN d g
next_myvar3 more_values
0 Mango 4
1 Mango 10
2 Mango 90
3 Orange 10
4 Orange 8
5 Orange 14
6 Vodka 20
7 Vodka 33
8 Vodka 18
Problem description
Currently, there is poor support for simultaneous melting of multiple groups of columns. lreshape
is old and undocumented. wide_to_long
api does not match melt
and it's slow.