Skip to content

Categorical dtype doesn't survive groupby of first, max, min, value_counts etc.: unwanted coercion to object #18502

Closed
@dcolascione

Description

@dcolascione

Code Sample, a copy-pastable example if possible

# Your code here

    In [1]: df=pd.DataFrame(dict(payload=[-1,-2,-1,-2], col=pd.Categorical(["foo", "bar", "bar", "qux"], ordered=True)));df
    Out[1]: 
       col  payload
    0  foo       -1
    1  bar       -2
    2  bar       -1
    3  qux       -2

    In [2]: df.groupby("payload").first().col.dtype
    Out[2]: dtype('O')

Problem description

Grouping shouldn't coerce a categorical into object. Categorical dtypes should be preserved as long as possible for efficiency and correctness.

Expected Output

The result dtype should be CategoricalDtype(categories=['bar', 'foo', 'qux'], ordered=True), just like it is here:

    In [6]: df.groupby("payload").head().col.dtype
    Out[6]: CategoricalDtype(categories=['bar', 'foo', 'qux'], ordered=True

#### Output of ``pd.show_versions()``

<details>
    INSTALLED VERSIONS
    ------------------
    commit: None
    python: 3.4.3.final.0
    python-bits: 64
    OS: Linux
    OS-release: 4.4.0-98-generic
    machine: x86_64
    processor: x86_64
    byteorder: little
    LC_ALL: None
    LANG: en_US.UTF-8
    LOCALE: en_US.UTF-8

    pandas: 0.21.0
    pytest: 3.2.5
    pip: 9.0.1
    setuptools: 36.5.0
    Cython: 0.20.1post0
    numpy: 1.13.3
    scipy: 0.13.3
    pyarrow: None
    xarray: 0.9.6
    IPython: 6.2.0
    sphinx: None
    patsy: None
    dateutil: 2.6.1
    pytz: 2017.3
    blosc: None
    bottleneck: None
    tables: 3.1.1
    numexpr: 2.6.4
    feather: None
    matplotlib: 1.3.1
    openpyxl: None
    xlrd: None
    xlwt: None
    xlsxwriter: None
    lxml: None
    bs4: 4.2.1
    html5lib: 0.999
    sqlalchemy: 0.8.4
    pymysql: None
    psycopg2: None
    jinja2: 2.7.2
    s3fs: 0.1.2
    fastparquet: None
    pandas_gbq: None
    pandas_datareader: None
[paste the output of ``pd.show_versions()`` here below this line]

</details>

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions