Closed
Description
get_dummies
currently just expects a Series.
In [17]: data
Out[17]:
PassengerId Survived Pclass \
0 1 0 3
1 2 1 1
Name Sex Age SibSp \
0 Braund, Mr. Owen Harris male 22 1
1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38 1
Parch Ticket Fare Cabin Embarked
0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
If it took DataFrames we could change the required call from
features = pd.concat([data.get(['Fare', 'Age']),
pd.get_dummies(data.Sex, prefix='Sex'),
pd.get_dummies(data.Pclass, prefix='Pclass'),
pd.get_dummies(data.Embarked, prefix='Embarked')],
axis=1)
to
features = pd.get_dummies(data)
We'll infer that things with object
dtype need to be encoded as 0's and 1's, but also take arguments to explicitly encode a column, or not.
The column names in the output will automatically include the original column name as a prefix, which can be overridden by the prefix
kwarg by passing a list or dictionary.
Same thing with prefix separators.
On NaN handling, I think we'll have one {prefix}_NaN
output column per original column when dummy_na
is True.
I've got some tests written already.