Skip to content

BUG: Wrong kurtosis outcome due to inadequate fix to previous issues #57972

Open
@j7168908jx

Description

@j7168908jx

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import polars as pl
import pandas as pd
import numpy as np
import scipy.stats as st

data = np.array([-2.05191341e-05,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00, -4.10391103e-05,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00])

print(pl.Series(data).kurtosis())
print(pd.Series(data).kurt())
print(st.kurtosis(data))

Issue Description

The output of pandas kurtosis function is incorrect.

After simple debugging I found a comment at core/nanops.py line 1360, in function nankurt,
saying to fix #18044 it manually zeros out values less than 1e-14, which is in any way improper.
This affects whatever data comes with not much variance but lots of data.

Expected Behavior

Output of provided example:

14.916104870028523
0.0
14.916104870028551

Expected output: roughly 14.9161 for unbiased (pandas's default behaviour) is correct.

Installed Versions

INSTALLED VERSIONS

commit : bdc79c1
python : 3.10.13.final.0
python-bits : 64
OS : Linux
OS-release : 5.19.0-1010-nvidia-lowlatency
Version : #10-Ubuntu SMP PREEMPT_DYNAMIC Wed Apr 26 00:40:27 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.2.1
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.8.2
setuptools : 65.5.0
pip : 24.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.22.1
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.2.0
gcsfs : None
matplotlib : 3.8.3
numba : 0.59.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 15.0.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.12.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions