Skip to content

Simultaneously melt multiple columns #17676

Open
@tdpetrou

Description

@tdpetrou

Melt Enhancement

Summary: This is a proposal with a pull request to enhance melt to simultaneously melt multiple groups of columns and to add functionality from wide_to_long along with better MultiIndexing capabilities. See this notebook for more examples.

  • Melts different groups of columns by passing a list of lists into value_vars. Each group gets melted into its own column. This feature replaces the need for lreshape.
  • When melting different groups of columns, groups do not have to be the same length. The shorter groups are filled with missing values.
  • Adds parameters stubnames(boolean), prefix and sep from function wide_to_long. It keeps the suffixes in separate columns and does not align them in the same way.
  • Can select any number of MultiIndex levels and greatly increase MultiIndex functionality
  • Works with repeated column names, which normally show up when selecting a subset of MultiIndex levels
  • Performance is ~30-40% faster than original melt, slightly slower than lreshape and much faster than wide_to_long
>>> df = pd.DataFrame({'City': ['Houston', 'Austin', 'Hoover'],
                   'State': ['Texas', 'Texas', 'Alabama'],
                   'Name':['Aria', 'Penelope', 'Niko'],
                   'Mango':[4, 10, 90],
                   'Orange': [10, 8, 14], 
                   'Watermelon':[40, 99, 43],
                   'Gin':[16, 200, 34],
                   'Vodka':[20, 33, 18]},
                 columns=['City', 'State', 'Name', 'Mango', 'Orange', 'Watermelon', 'Gin', 'Vodka'])

      City    State      Name  Mango  Orange  Watermelon  Gin  Vodka
0  Houston    Texas      Aria      4      10          40   16     20
1   Austin    Texas  Penelope     10       8          99  200     33
2   Hoover  Alabama      Niko     90      14          43   34     18

Use a list of lists in value_vars to melt the fruit and drinks

>>> df.melt(id_vars=['City', 'State'], value_vars=[['Mango', 'Orange', 'Watermelon'], ['Gin', 'Vodka']], 
                    var_name=['Fruit', 'Drink'], value_name=['Pounds', 'Ounces'])

      City    State       Fruit  Pounds  Drink  Ounces
0  Houston    Texas       Mango       4    Gin    16.0
1   Austin    Texas       Mango      10    Gin   200.0
2   Hoover  Alabama       Mango      90    Gin    34.0
3  Houston    Texas      Orange      10  Vodka    20.0
4   Austin    Texas      Orange       8  Vodka    33.0
5   Hoover  Alabama      Orange      14  Vodka    18.0
6  Houston    Texas  Watermelon      40    nan     NaN
7   Austin    Texas  Watermelon      99    nan     NaN
8   Hoover  Alabama  Watermelon      43    nan     NaN

wide_to_long functionality. Added parameters stubnames(boolean), sep and suffix.

>>> df1 = pd.DataFrame({'group': ['a', 'b', 'c'],
                   'exp_1':[4, 10, -9],
                   'exp_2': [10, 8, 14], 
                   'res_1':[8, 5, 4],
                   'res_3':[11, 0, 7]}, columns=['group', 'exp_1', 'exp_2', 'res_1', 'res_3'])

  group  exp_1  exp_2  res_1  res_3
0     a      4     10      8     11
1     b     10      8      5      0
2     c     -9     14      4      7

>>> df1.melt(id_vars='group', value_vars=['exp','res'], stubnames=True, sep='_')

  group  variable_exp  exp  variable_res  res
0     a             1    4             1    8
1     b             1   10             1    5
2     c             1   -9             1    4
3     a             2   10             3   11
4     b             2    8             3    0
5     c             2   14             3    7

Also adds support for all kinds of multiindexing

>>> df2 = df.copy()
>>> df2.columns = pd.MultiIndex.from_arrays([list('aabbcccd'), list('ffffgggg'), df.columns], 
                                       names=[None, None, 'some vars'])

                 a                  b            c                     d
                 f                  f            g                     g
some vars     City    State      Name Mango Orange Watermelon  Gin Vodka
0          Houston    Texas      Aria     4     10         40   16    20
1           Austin    Texas  Penelope    10      8         99  200    33
2           Hoover  Alabama      Niko    90     14         43   34    18

>>> df2.melt(id_vars=[('a', 'f', 'State')], 
           value_vars=[[('b', 'f', 'Name'), ('c', 'g', 'Watermelon')],
                       [('b','f','Mango'), ('c','g', 'Orange'), ('d', 'g', 'Vodka')]],
           var_name=[['myvar1', 'myvar2', 'myvar3'],
                     ['next_myvar1', 'next_myvar2', 'next_myvar3']],
           value_name=['some values', 'more_values'])

 (a, f, State) myvar1 myvar2      myvar3 some values next_myvar1 next_myvar2  \
0         Texas      b      f        Name        Aria           b           f   
1         Texas      b      f        Name    Penelope           b           f   
2       Alabama      b      f        Name        Niko           b           f   
3         Texas      c      g  Watermelon          40           c           g   
4         Texas      c      g  Watermelon          99           c           g   
5       Alabama      c      g  Watermelon          43           c           g   
6         Texas    nan    nan         nan         NaN           d           g   
7         Texas    nan    nan         nan         NaN           d           g   
8       Alabama    nan    nan         nan         NaN           d           g   

  next_myvar3  more_values  
0       Mango            4  
1       Mango           10  
2       Mango           90  
3      Orange           10  
4      Orange            8  
5      Orange           14  
6       Vodka           20  
7       Vodka           33  
8       Vodka           18 

Problem description

Currently, there is poor support for simultaneous melting of multiple groups of columns. lreshape is old and undocumented. wide_to_long api does not match melt and it's slow.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions