Closed
Description
Is your feature request related to a problem?
pd.get_dummies
provides a way to turn a sequence of category-like data into a one-hot encoded data frame. However, there is no easy way (to my knowledge) of going in the other direction: given a boolean dataframe where the row sums are all 1, produce a categorical series. This task is particularly valuable for serialisation.
Describe the solution you'd like
Some way of constructing a Categorical
array from a one-hot encoded dataframe (view). To avoid piling extra functionality into the existing constructor, a class method could be used.
Scratch implementation:
import numpy as np
import pandas as pd
class Categorical:
...
@classmethod
def from_dummies(cls, df: pd.DataFrame, **kwargs):
onehot = df.astype(bool)
if (onehot.sum(axis=1) > 1).any():
raise ValueError("Some rows belong to >1 category")
index_into = pd.Series([np.nan] + list(onehot.columns))
mult_by = np.arange(1, len(index_into))
indexes = (onehot.astype(int) * mult_by).sum(axis=1)
values = index_into[indexes]
return cls(values, df.columns, **kwargs)
Describe alternatives you've considered
- A free function (less discoverable, less self-documenting)
- Importing scikit-learn