DOC: `value_counts` description doesn't match code logic · Issue #48635 · pandas-dev/pandas

Pandas version checks

I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

dev docs

last release docs

Documentation problem

value_counts utility incorrectly claims counting "unique rows", respectively "unique combinations".

In reality, it only does groupby followed by size, no nunique:

pandas/pandas/core/frame.py

Lines 6468 to 6592 in ca60aab

    
               def value_counts( 
        
                   self, 
        
                   subset: Sequence[Hashable] | None = None, 
        
                   normalize: bool = False, 
        
                   sort: bool = True, 
        
                   ascending: bool = False, 
        
                   dropna: bool = True, 
        
               ): 
        
                   """ 
        
                   Return a Series containing counts of unique rows in the DataFrame. 
        
                   .. versionadded:: 1.1.0 
        
                   Parameters 
        
                   ---------- 
        
                   subset : list-like, optional 
        
                       Columns to use when counting unique combinations. 
        
                   normalize : bool, default False 
        
                       Return proportions rather than frequencies. 
        
                   sort : bool, default True 
        
                       Sort by frequencies. 
        
                   ascending : bool, default False 
        
                       Sort in ascending order. 
        
                   dropna : bool, default True 
        
                       Don’t include counts of rows that contain NA values. 
        
                       .. versionadded:: 1.3.0 
        
                   Returns 
        
                   ------- 
        
                   Series 
        
                   See Also 
        
                   -------- 
        
                   Series.value_counts: Equivalent method on Series. 
        
                   Notes 
        
                   ----- 
        
                   The returned Series will have a MultiIndex with one level per input 
        
                   column. By default, rows that contain any NA values are omitted from 
        
                   the result. By default, the resulting Series will be in descending 
        
                   order so that the first element is the most frequently-occurring row. 
        
                   Examples 
        
                   -------- 
        
                   >>> df = pd.DataFrame({'num_legs': [2, 4, 4, 6], 
        
                   ...                    'num_wings': [2, 0, 0, 0]}, 
        
                   ...                   index=['falcon', 'dog', 'cat', 'ant']) 
        
                   >>> df 
        
                           num_legs  num_wings 
        
                   falcon         2          2 
        
                   dog            4          0 
        
                   cat            4          0 
        
                   ant            6          0 
        
                   >>> df.value_counts() 
        
                   num_legs  num_wings 
        
                   4         0            2 
        
                   2         2            1 
        
                   6         0            1 
        
                   dtype: int64 
        
                   >>> df.value_counts(sort=False) 
        
                   num_legs  num_wings 
        
                   2         2            1 
        
                   4         0            2 
        
                   6         0            1 
        
                   dtype: int64 
        
                   >>> df.value_counts(ascending=True) 
        
                   num_legs  num_wings 
        
                   2         2            1 
        
                   6         0            1 
        
                   4         0            2 
        
                   dtype: int64 
        
                   >>> df.value_counts(normalize=True) 
        
                   num_legs  num_wings 
        
                   4         0            0.50 
        
                   2         2            0.25 
        
                   6         0            0.25 
        
                   dtype: float64 
        
                   With `dropna` set to `False` we can also count rows with NA values. 
        
                   >>> df = pd.DataFrame({'first_name': ['John', 'Anne', 'John', 'Beth'], 
        
                   ...                    'middle_name': ['Smith', pd.NA, pd.NA, 'Louise']}) 
        
                   >>> df 
        
                     first_name middle_name 
        
                   0       John       Smith 
        
                   1       Anne        <NA> 
        
                   2       John        <NA> 
        
                   3       Beth      Louise 
        
                   >>> df.value_counts() 
        
                   first_name  middle_name 
        
                   Beth        Louise         1 
        
                   John        Smith          1 
        
                   dtype: int64 
        
                   >>> df.value_counts(dropna=False) 
        
                   first_name  middle_name 
        
                   Anne        NaN            1 
        
                   Beth        Louise         1 
        
                   John        Smith          1 
        
                               NaN            1 
        
                   dtype: int64 
        
                   """ 
        
                   if subset is None: 
        
                       subset = self.columns.tolist() 
        
                   counts = self.groupby(subset, dropna=dropna).grouper.size() 
        
                   if sort: 
        
                       counts = counts.sort_values(ascending=ascending) 
        
                   if normalize: 
        
                       counts /= counts.sum() 
        
                   # Force MultiIndex for single column 
        
                   if len(subset) == 1: 
        
                       counts.index = MultiIndex.from_arrays( 
        
                           [counts.index], names=[counts.index.name] 
        
                       ) 
        
                   return counts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DOC: `value_counts` description doesn't match code logic #48635

Pandas version checks

Location of the documentation

Documentation problem

Suggested fix for documentation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	def value_counts(
	self,
	subset: Sequence[Hashable] \| None = None,
	normalize: bool = False,
	sort: bool = True,
	ascending: bool = False,
	dropna: bool = True,
	):
	"""
	Return a Series containing counts of unique rows in the DataFrame.

	.. versionadded:: 1.1.0

	Parameters
	----------
	subset : list-like, optional
	Columns to use when counting unique combinations.
	normalize : bool, default False
	Return proportions rather than frequencies.
	sort : bool, default True
	Sort by frequencies.
	ascending : bool, default False
	Sort in ascending order.
	dropna : bool, default True
	Don’t include counts of rows that contain NA values.

	.. versionadded:: 1.3.0

	Returns
	-------
	Series

	See Also
	--------
	Series.value_counts: Equivalent method on Series.

	Notes
	-----
	The returned Series will have a MultiIndex with one level per input
	column. By default, rows that contain any NA values are omitted from
	the result. By default, the resulting Series will be in descending
	order so that the first element is the most frequently-occurring row.

	Examples
	--------
	>>> df = pd.DataFrame({'num_legs': [2, 4, 4, 6],
	... 'num_wings': [2, 0, 0, 0]},
	... index=['falcon', 'dog', 'cat', 'ant'])
	>>> df
	num_legs num_wings
	falcon 2 2
	dog 4 0
	cat 4 0
	ant 6 0

	>>> df.value_counts()
	num_legs num_wings
	4 0 2
	2 2 1
	6 0 1
	dtype: int64

	>>> df.value_counts(sort=False)
	num_legs num_wings
	2 2 1
	4 0 2
	6 0 1
	dtype: int64

	>>> df.value_counts(ascending=True)
	num_legs num_wings
	2 2 1
	6 0 1
	4 0 2
	dtype: int64

	>>> df.value_counts(normalize=True)
	num_legs num_wings
	4 0 0.50
	2 2 0.25
	6 0 0.25
	dtype: float64

	With `dropna` set to `False` we can also count rows with NA values.

	>>> df = pd.DataFrame({'first_name': ['John', 'Anne', 'John', 'Beth'],
	... 'middle_name': ['Smith', pd.NA, pd.NA, 'Louise']})
	>>> df
	first_name middle_name
	0 John Smith
	1 Anne <NA>
	2 John <NA>
	3 Beth Louise

	>>> df.value_counts()
	first_name middle_name
	Beth Louise 1
	John Smith 1
	dtype: int64

	>>> df.value_counts(dropna=False)
	first_name middle_name
	Anne NaN 1
	Beth Louise 1
	John Smith 1
	NaN 1
	dtype: int64
	"""
	if subset is None:
	subset = self.columns.tolist()

	counts = self.groupby(subset, dropna=dropna).grouper.size()

	if sort:
	counts = counts.sort_values(ascending=ascending)
	if normalize:
	counts /= counts.sum()

	# Force MultiIndex for single column
	if len(subset) == 1:
	counts.index = MultiIndex.from_arrays(
	[counts.index], names=[counts.index.name]
	)

	return counts

Uh oh!

DOC: value_counts description doesn't match code logic #48635

Description

Pandas version checks

Location of the documentation

Documentation problem

Suggested fix for documentation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

DOC: `value_counts` description doesn't match code logic #48635