inconsistent name handling in value_counts, part 2

I was happy to see in the release notes for 0.17.0 that value_counts no longer discards the series name, but the implementation wasn't what I expected.

0.17.0 gives

``` py
>>> series = pd.Series([1731, 364, 813, 1731], name='user_id')
>>> series.value_counts()
1731    2
813     1
364     1
Name: user_id, dtype: int64
```

which doesn't set the index name.

In my opinion the old series name belongs in the index, not in the series name:

``` py
>>> series.value_counts()
user_id
1731    2
813     1
364     1
dtype: int64
```

Why:
- It's logical: the user_id has moved to the index, and the values now represent occurrence counts
- This would be consistent with how `.groupby().size()` behaves
- Adding a missing index name is cumbersome and requires creating a temporary variable
- In many cases the series name is discarded, while index names tend to stick around: for example, `pd.DataFrame({'n': series.value_counts(), 'has_duplicates': series.value_counts() > 1})` should really have user_id as an index name

There are three options:
- result.name = None and result.index.name = series.name
- result.name = series.name and result.index.name = series.name
- result.name = 'size' or 'count' and result.index.name = series.name

The first option seems more elegant to me but @sinhrks, who reported #10150, apparently expected result.name to be filled, so perhaps there are use cases where the second option is useful.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

inconsistent name handling in value_counts, part 2 #11579

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

inconsistent name handling in value_counts, part 2 #11579

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions