Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
https://github.com/FlorinAndrei/misc/blob/master/HeartDisease.csv
import pandas as pd
hd = pd.read_csv("HeartDisease.csv")
pd.cut(hd["Age"], bins=3, include_lowest=True)
Issue Description
The lowest of the three bins created is: (28.951, 45.0]
. This is incorrect in several ways.
First off, I expect a left-inclusive bin there. That bin is not left-inclusive.
Secondly, the minimum value in that column is 29. It is not 28.951 - that float is an artifact of the library and does not exist in the data.
One workaround you can find online is this:
_, edges = pd.cut(hd["Age"], bins=3, include_lowest=True, retbins=True)
edges_r = [round(x) for x in edges]
pd.cut(hd["Age"], bins=edges_r)
But this is pointless and annoying. The library should simply return the true minimum value.
Expected Behavior
The bin I expect is [29.0, 45.0]
. I would also settle for [29, 45]
.
Installed Versions
pd.show_versions()
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\flori\AppData\Roaming\Python\Python310\site-packages\pandas\util_print_versions.py", line 109, in show_versions
deps = _get_dependency_info()
File "C:\Users\flori\AppData\Roaming\Python\Python310\site-packages\pandas\util_print_versions.py", line 88, in get_dependency_info
mod = import_optional_dependency(modname, errors="ignore")
File "C:\Users\flori\AppData\Roaming\Python\Python310\site-packages\pandas\compat_optional.py", line 138, in import_optional_dependency
module = importlib.import_module(name)
File "C:\Program Files\Python310\lib\importlib_init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in gcd_import
File "", line 1027, in find_and_load
File "", line 1002, in find_and_load_unlocked
File "", line 945, in find_spec
File "C:\Users\flori\AppData\Roaming\Python\Python310\site-packages_distutils_hack_init.py", line 79, in find_spec
return method()
File "C:\Users\flori\AppData\Roaming\Python\Python310\site-packages_distutils_hack_init.py", line 100, in spec_for_pip
if self.pip_imported_during_build():
File "C:\Users\flori\AppData\Roaming\Python\Python310\site-packages_distutils_hack_init.py", line 111, in pip_imported_during_build
return any(
File "C:\Users\flori\AppData\Roaming\Python\Python310\site-packages_distutils_hack_init.py", line 112, in
frame.f_globals['file'].endswith('setup.py')
KeyError: 'file'
I do have the latest Pandas installed (1.4.3) but there's another bug now with show_versions()
that prevents me from printing that info.
Python 3.10.6
Numpy 1.22.4
Windows 10
Jupyter Notebook