Skip to content

ENH: get_dummies on DataFrames #8133

Closed
@TomAugspurger

Description

@TomAugspurger

get_dummies currently just expects a Series.

In [17]: data
Out[17]: 
   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   

                                                Name     Sex  Age  SibSp  \
0                            Braund, Mr. Owen Harris    male   22      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female   38      1   

   Parch     Ticket     Fare Cabin Embarked  
0      0  A/5 21171   7.2500   NaN        S  
1      0   PC 17599  71.2833   C85        C  

If it took DataFrames we could change the required call from

features = pd.concat([data.get(['Fare', 'Age']),
                      pd.get_dummies(data.Sex, prefix='Sex'),
                      pd.get_dummies(data.Pclass, prefix='Pclass'),
                      pd.get_dummies(data.Embarked, prefix='Embarked')],
                     axis=1)

to

features = pd.get_dummies(data)

We'll infer that things with object dtype need to be encoded as 0's and 1's, but also take arguments to explicitly encode a column, or not.

The column names in the output will automatically include the original column name as a prefix, which can be overridden by the prefix kwarg by passing a list or dictionary.

Same thing with prefix separators.

On NaN handling, I think we'll have one {prefix}_NaN output column per original column when dummy_na is True.

I've got some tests written already.

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignCategoricalCategorical Data TypeReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions