groupby sorting - don't specify, or forbid?

From the last call, the cuDF folks stated that a goal of theirs is to let users prototype with pandas on CPU, and then switch to GPU with cuDF and have their code "just work".

One issue they currently face is `groupby`: pandas sorts by default, so that's what their users expect

Even if pandas were to adopt the standard in its main namespace, this would not accomplish their goal. This is because whilst the standard doesn't specify whether `groupby` should sort, it doesn't forbid it either - thus, pandas could continue sorting in `groupby` by default whilst respecting the standard, but then `cuDF` would be no better off.

cuDF also said that if pandas were to implement the standard in a separate namespace, then users wouldn't necessarily want to look up the standard and would expect things to just work as they're used to (i.e. pandas main namespace).

The only way I can think of of accomplishing cuDF's goal, without changes in cuDF, would be:
1. the Standard forbids sorting in `groupby`
2. pandas adopts the standard in its main namespace

I don't mean to be a "Pessimistic Pete", but the second one seems unlikely to land. Some deviations from pandas in cuDF may be warranted.

My suggestion is:
- for cuDF's end-user groupby issue, they could remove the default from `groupby`, thus forcing users to specify a value for `sort`. Users would be required to type an extra 2 words, but they'd probably be better off because of it
- the Standard stays developer-oriented, rather end-user-oriented
- to minimise surprise to users of the Standard, that the standard explicitly forbid sorting in `groupby`. It's been mentioned that developers might only test their code using pandas and then expect it to work with other DataFrame libraries, so this would reduce the chances of surprises. And so long as it's in a separate namespace, there's no risk of breaking millions of users' code

EDIT: on the last point - maybe the standard doesn't need to forbid sorting, but the pandas implementation of it shouldn't sort by default

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

groupby sorting - don't specify, or forbid? #102

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

groupby sorting - don't specify, or forbid? #102

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions