Column reductions should return 1-row Column

### Let's talk column reductions.

I see two uses cases for them:
- a user wants the exact value of a scalar, right now:
  ```python
  df: DataFrame
  df.col('a').mean()
  ```
- a user just needs to use the scalar as part of another operation, so it can stay lazy if necessary:
  ```python
  df: DataFrame
  df.assign((df.col('a') - df.col('a').mean()).rename('a_centered'))
  ```

The Standard currently defines the return value of `Column.mean` to be `Scalar`. Implementations are supposed to figure out which of the two cases above the user wants.

I have two problems with this:
- I really don't like anything related to implicit materialisation (so long as we're defining a top-level Python API)
- we have an inconsistency with the DataFrame case:
  - `DataFrame.mean` returns a 1-row DataFrame
  - `Column.mean` returns a Scalar

### Proposal

Column reductions return 1-row Columns (just like how DataFrame reductions return 1-row DataFrames).

Broadcasting rules: a binary operation between a n-row Column and a 1-row Column, the 1-row Column is broadcast to be of length-n. So `column - column.mean()` is well-defined, and everything can stay lazy if necessary.

If someone really need the value of a reduction now, they can call `.get_value(0)`. And behaviour of scalars may vary based on implementations, but I think that's fine.

At least, for the (much more common) case when reductions are used as part of other operations, the operations can stay completely within the DataFrame API now, the rules become predictable, and everything is well-defined

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Column reductions should return 1-row Column #297

Let's talk column reductions.

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Column reductions should return 1-row Column #297

Description

Let's talk column reductions.

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions