Skip to content

DOC: Clarify stability of sorting in documentation of DataFrame.sort_values for multiple columns / labels #38357

Closed
@jotasi

Description

@jotasi

Location of the documentation

Description of the argument kind in the documentation of pandas.DataFrame.sort_values.

Documentation problem

The current documentation states (emphasize mine):

Choice of sorting algorithm. See also ndarray.np.sort for more information. mergesort is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label.

To me that means, that the kind argument is ignored if you sort by multiple columns / labels. But I think one could even interpret it as mergesort explicitely is not done in that case. Running some tests, it does look like the sorting is stable when sorting multiple columns / labels. But it is not written anywhere. If it really is guaranteed that the sorting is stable for sorting by multiple columns / labels, it would be nice to have this written down explicitely in the documentation.

Suggested fix for documentation

IF that is actually the case, I would suggest to add a sentence to the effect that the sorting is stable if sorting is done by multiple columns / labels. E.g. change the kind argument description to:

Choice of sorting algorithm. See also ndarray.np.sort for more information. mergesort is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label. For DataFrames, if sorting by multiple columns or labels, this argument is ignored, defaulting to a stable sorting algorithm.

Metadata

Metadata

Assignees

No one assigned

    Labels

    AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffDocs

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions