Skip to content

Dataframe Groupby value_counts with bins parameter #32471

Open
@scottboston

Description

@scottboston

Found on Stack Overflow post here

df=pd.DataFrame([[0,0],[1,100],[0,100],[2,0],[3,100],[4,100],[4,100],[4,100],[1,100],[3,100]],columns=['key','score'])
df.groupby('key')['score'].value_counts(bins=[0,20,40,60,80,100])

Problem description

Outputs counts at the with wrong index labels. Note: data only occurs in 0-20 and 80-100.

key  score         
0    (-0.001, 20.0]    1
     (20.0, 40.0]      1
     (40.0, 60.0]      0
     (60.0, 80.0]      0
     (80.0, 100.0]     0
1    (20.0, 40.0]      2
     (-0.001, 20.0]    0
     (40.0, 60.0]      0
     (60.0, 80.0]      0
     (80.0, 100.0]     0
2    (-0.001, 20.0]    1
     (20.0, 40.0]      0
     (40.0, 60.0]      0
     (60.0, 80.0]      0
     (80.0, 100.0]     0
3    (20.0, 40.0]      2
     (-0.001, 20.0]    0
     (40.0, 60.0]      0
     (60.0, 80.0]      0
     (80.0, 100.0]     0
4    (20.0, 40.0]      3
     (-0.001, 20.0]    0
     (40.0, 60.0]      0
     (60.0, 80.0]      0
     (80.0, 100.0]     0
Name: score, dtype: int64

Expected Output

df.groupby('key')['score'].apply(pd.Series.value_counts, bins=[0,20,40,60,80,100])

key                
0    (80.0, 100.0]     1
     (-0.001, 20.0]    1
     (60.0, 80.0]      0
     (40.0, 60.0]      0
     (20.0, 40.0]      0
1    (80.0, 100.0]     2
     (60.0, 80.0]      0
     (40.0, 60.0]      0
     (20.0, 40.0]      0
     (-0.001, 20.0]    0
2    (-0.001, 20.0]    1
     (80.0, 100.0]     0
     (60.0, 80.0]      0
     (40.0, 60.0]      0
     (20.0, 40.0]      0
3    (80.0, 100.0]     2
     (60.0, 80.0]      0
     (40.0, 60.0]      0
     (20.0, 40.0]      0
     (-0.001, 20.0]    0
4    (80.0, 100.0]     3
     (60.0, 80.0]      0
     (40.0, 60.0]      0
     (20.0, 40.0]      0
     (-0.001, 20.0]    0
Name: score, dtype: int64

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.0.1
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.0.1
Cython : 0.29.12
pytest : 5.0.1
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.3.4
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.7.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.3.4
matplotlib : 3.1.0
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.0.1
pyxlsb : None
s3fs : None
scipy : 1.3.0
sqlalchemy : 1.3.5
tables : 3.5.2
tabulate : 0.8.6
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8
numba : 0.45.0

Metadata

Metadata

Assignees

Labels

AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffBugGroupby

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions