Closed
Description
When subsetting an ordered pd.Categorical
object using less than/greater than on the ordered values, the less/than greater than follow lexicographical order, not categorical order.
If you create a dataframe and assign categories, you can subset:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame({'x':np.arange(10), 'y':list('AAAABBBCCC')})
In [4]: df.y = df.y.astype('category')
In [5]: df.ix[(df.y >= "A") & (df.y <= "B")]
Out[5]:
x y
0 0 A
1 1 A
2 2 A
3 3 A
4 4 B
5 5 B
6 6 B
But if you try to subset on an ordered category, it does the lexicographical order instead:
In [6]: df = pd.DataFrame({'x':np.arange(10), 'y':list('AAAABBBCCC')})
In [7]: df.y = pd.Categorical(df.y, categories=['A', 'C', 'B'], ordered=True)
In [8]: df.ix[(df.y >= "A") & (df.y <= "C")]
Out[8]:
x y
0 0 A
1 1 A
2 2 A
3 3 A
4 4 B
5 5 B
6 6 B
7 7 C
8 8 C
9 9 C