Closed
Description
I was happy to see in the release notes for 0.17.0 that value_counts no longer discards the series name, but the implementation wasn't what I expected.
0.17.0 gives
>>> series = pd.Series([1731, 364, 813, 1731], name='user_id')
>>> series.value_counts()
1731 2
813 1
364 1
Name: user_id, dtype: int64
which doesn't set the index name.
In my opinion the old series name belongs in the index, not in the series name:
>>> series.value_counts()
user_id
1731 2
813 1
364 1
dtype: int64
Why:
- It's logical: the user_id has moved to the index, and the values now represent occurrence counts
- This would be consistent with how
.groupby().size()
behaves - Adding a missing index name is cumbersome and requires creating a temporary variable
- In many cases the series name is discarded, while index names tend to stick around: for example,
pd.DataFrame({'n': series.value_counts(), 'has_duplicates': series.value_counts() > 1})
should really have user_id as an index name
There are three options:
- result.name = None and result.index.name = series.name
- result.name = series.name and result.index.name = series.name
- result.name = 'size' or 'count' and result.index.name = series.name
The first option seems more elegant to me but @sinhrks, who reported #10150, apparently expected result.name to be filled, so perhaps there are use cases where the second option is useful.