Skip to content

ENH: DataFrame sort columns by rows: sort_values(axis=1) #13622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.19.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,7 @@ Other enhancements
- ``pd.read_html()`` has gained support for the ``decimal`` option (:issue:`12907`)
- A top-level function :func:`union_categorical` has been added for combining categoricals, see :ref:`Unioning Categoricals<categorical.union>` (:issue:`13361`)
- ``Series`` has gained the properties ``.is_monotonic``, ``.is_monotonic_increasing``, ``.is_monotonic_decreasing``, similar to ``Index`` (:issue:`13336`)
- ``DataFrame.sort_values()`` has gained support for re-ordering columns by index label (:issue:`10806`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not clear what actually you are changing. This is for axis 1 support for by

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't they mean the same thing? (by is always required sort_values, and axis 1 is equivalent to "columns".) See if the new wording in 2773cdf hits what you're getting at, otherwise can you please specify?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rewording in that commit is good! (exactly what I wanted to suggest to combine both :-))

Maybe just "with" after "columns"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


.. _whatsnew_0190.api:

Expand Down
7 changes: 3 additions & 4 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -3127,9 +3127,8 @@ def sort_values(self, by, axis=0, ascending=True, inplace=False,
kind='quicksort', na_position='last'):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the .sort_values doc-string to look like others, e.g. look at sortlevel (for the axis arg)

axis = self._get_axis_number(axis)
other_axis = 0 if axis == 1 else 1

if axis != 0:
raise ValueError('When sorting by column, axis must be 0 (rows)')
if not isinstance(by, list):
by = [by]
if com.is_sequence(ascending) and len(by) != len(ascending):
Expand All @@ -3145,7 +3144,7 @@ def trans(v):

keys = []
for x in by:
k = self[x].values
k = self.xs(x, axis=other_axis).values
if k.ndim == 2:
raise ValueError('Cannot sort by duplicate column %s' %
str(x))
Expand All @@ -3157,7 +3156,7 @@ def trans(v):
from pandas.core.groupby import _nargsort

by = by[0]
k = self[by].values
k = self.xs(by, axis=other_axis).values
if k.ndim == 2:

# try to be helpful
Expand Down
25 changes: 21 additions & 4 deletions pandas/tests/frame/test_sorting.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ def test_sort_values(self):
frame = DataFrame([[1, 1, 2], [3, 1, 0], [4, 5, 6]],
index=[1, 2, 3], columns=list('ABC'))

# by column
# by column (axis=0)
sorted_df = frame.sort_values(by='A')
indexer = frame['A'].argsort().values
expected = frame.ix[frame.index[indexer]]
Expand Down Expand Up @@ -116,9 +116,26 @@ def test_sort_values(self):
self.assertRaises(ValueError, lambda: frame.sort_values(
by=['A', 'B'], axis=2, inplace=True))

msg = 'When sorting by column, axis must be 0'
with assertRaisesRegexp(ValueError, msg):
frame.sort_values(by='A', axis=1)
# by row (axis=1): GH 10806
sorted_df = frame.sort_values(by=3, axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw in an axis='columns' instead of one of the axis=1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that the string columns should be supported as a valid axis value? That's neat. Can you point me to where that's documented though? I can't find reference to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what _get_axis_number(..) does

the doc-string should be something like:

axis : {index (0), columns (1)}

which we programatically generate (this is from DataFrame.min for example), look in generic.py to see how

expected = frame
assert_frame_equal(sorted_df, expected)

sorted_df = frame.sort_values(by=3, axis=1, ascending=False)
expected = frame.reindex(columns=['C', 'B', 'A'])
assert_frame_equal(sorted_df, expected)

sorted_df = frame.sort_values(by=[1, 2], axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test for the other options, na_position (also add some nas), and inplace.

expected = frame.reindex(columns=['B', 'A', 'C'])
assert_frame_equal(sorted_df, expected)

sorted_df = frame.sort_values(by=[1, 3], axis=1,
ascending=[True, False])
assert_frame_equal(sorted_df, expected)

sorted_df = frame.sort_values(by=[1, 3], axis=1, ascending=False)
expected = frame.reindex(columns=['C', 'B', 'A'])
assert_frame_equal(sorted_df, expected)

msg = r'Length of ascending \(5\) != length of by \(2\)'
with assertRaisesRegexp(ValueError, msg):
Expand Down