Skip to content

less than, greater than cuts on categorical don't follow order #9836

Closed
@olgabot

Description

@olgabot

When subsetting an ordered pd.Categorical object using less than/greater than on the ordered values, the less/than greater than follow lexicographical order, not categorical order.

If you create a dataframe and assign categories, you can subset:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame({'x':np.arange(10), 'y':list('AAAABBBCCC')})

In [4]: df.y = df.y.astype('category')

In [5]: df.ix[(df.y >= "A") & (df.y <= "B")]
Out[5]: 
   x  y
0  0  A
1  1  A
2  2  A
3  3  A
4  4  B
5  5  B
6  6  B

But if you try to subset on an ordered category, it does the lexicographical order instead:

In [6]: df = pd.DataFrame({'x':np.arange(10), 'y':list('AAAABBBCCC')})

In [7]: df.y = pd.Categorical(df.y, categories=['A', 'C', 'B'], ordered=True)

In [8]: df.ix[(df.y >= "A") & (df.y <= "C")]
Out[8]: 
   x  y
0  0  A
1  1  A
2  2  A
3  3  A
4  4  B
5  5  B
6  6  B
7  7  C
8  8  C
9  9  C

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions