Skip to content

Categorical in dataframe is sorted lexically. #7848

Closed
@has2k1

Description

@has2k1

code

df = pd.DataFrame({"id":[6,5,4,3,2,1], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})
df["grade"] = pd.Categorical(df["raw_grade"])
df['grade'].cat.reorder_levels(['b', 'e', 'a'])

# sorts 'grade' according to the order of the levels
df.sort(columns=['grade'])  

correct output

  id   raw_grade  grade
4 5    b          b
3 4    b          b
0 1    e          e
2 6    a          a
1 3    a          a
5 2    a          a

code

# sorts 'grade' lexically
df.sort(columns=['grade', 'id'])

wrong output

  id   raw_grade  grade
4 2    a          a
3 3    a          a
0 6    a          a
2 4    b          b
1 5    b          b
5 1    e          e

When there is more than one element in the columns list, the Categoricals columns are sorted lexically.

pandas: 0.14.1-78-g24b309f

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions