ENH: get_dummies on DataFrames

`get_dummies` currently just expects a Series.

``` python
In [17]: data
Out[17]: 
   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   

                                                Name     Sex  Age  SibSp  \
0                            Braund, Mr. Owen Harris    male   22      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female   38      1   

   Parch     Ticket     Fare Cabin Embarked  
0      0  A/5 21171   7.2500   NaN        S  
1      0   PC 17599  71.2833   C85        C  
```

If it took DataFrames we could change the required call from

```
features = pd.concat([data.get(['Fare', 'Age']),
                      pd.get_dummies(data.Sex, prefix='Sex'),
                      pd.get_dummies(data.Pclass, prefix='Pclass'),
                      pd.get_dummies(data.Embarked, prefix='Embarked')],
                     axis=1)
```

to

```
features = pd.get_dummies(data)
```

We'll infer that things with `object` dtype need to be encoded as 0's and 1's, but also take arguments to explicitly encode a column, or not.

The column names in the output will automatically include the original column name as a prefix, which can be overridden by the `prefix` kwarg by passing a list or dictionary.

Same thing with prefix separators.

On NaN handling, I think we'll have one `{prefix}_NaN` output column per original column when `dummy_na` is True.

I've got some tests written already.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: get_dummies on DataFrames #8133

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: get_dummies on DataFrames #8133

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions