Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Put this in `tmp.pyx`:
# In Cython code - any use of `_libs.khash` will trigger this
from pandas._libs.khash cimport kh_int64_t
Then run cython tmp.pyx
. That will result in:
Error compiling Cython file:
------------------------------------------------------------
...
bint kh_exist_strbox(kh_strbox_t*, khiter_t) nogil
khuint_t kh_needed_n_buckets(khuint_t element_n) nogil
include "khash_for_primitive_helper.pxi"
^
------------------------------------------------------------
/home/rgommers/mambaforge/envs/pandas-dev/lib/python3.8/site-packages/pandas/_libs/khash.pxd:129:0: 'khash_for_primitive_helper.pxi' not found
Issue Description
I found this when following up on #49115 (comment):
Cython.Compiler.Errors.InternalError: Internal compiler error: 'khash_for_primitive_helper.pxi' not found
There are a couple of related issues that interact here:
- pandas is shipping lots of files in wheels that should not be there. In particular,
.pxd
and.pyx
files inpandas/_libs
. - Use of absolute
cimport
's which should probably be relative - Use of
include <name>.pxi"
in.pxd
files. This should be replaced by shared declarations in a common.pxd
file (see the warning in http://docs.cython.org/en/latest/src/userguide/language_basics.html#the-include-statement-and-include-files)
For (1), if you download any pandas 1.5.3 wheel, you'll see in pandas/_libs
:
khash.pxd
khash_for_primitive_helper.pxi.in
And, notably, khash.pxd
contains include "khash_for_primitive_helper.pxi"
- and that file is not present (only the pxi.in
template is). So basically a broken private .pxd
here. Which is then picked up during the build in gh-49115 because of absolute from pandas._libs.khash cimport ...
statements inside pandas
itself.
That particular issue probably shows up in the Meson build but not during the setup.py
-based build because in the latter the .pxi
file is generated in-place rather than in the build dir. However, as my reproducer above shows, this is a bit of a house of cards, because the absolute from pandas._libs
imports are actually broken.
Expected Behavior
Expected is that the .pxd
s aren't shipped, so anyone trying to access private .pxd
files will get a clear exception. This will be automatically fixed when the Meson build is merged. However, that still leaves potential issues in any environments that already have pandas installed.
My suggestion is to:
- Use relative
cimport
s for accessing anything within pandas (needs testing, because Cython'scimport
mechanism is very fragile all around). - Get rid of the
.pxi.in
and replace it with the recommended.pxd
method.
Installed Versions
INSTALLED VERSIONS
commit : 2e218d1
python : 3.8.16.final.0
python-bits : 64
OS : Linux
OS-release : 6.2.1-arch1-1
Version : #1 SMP PREEMPT_DYNAMIC Sun, 26 Feb 2023 03:39:23 +0000
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.5.3
numpy : 1.23.5
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 67.4.0
pip : 23.0.1
Cython : 0.29.33
pytest : 7.2.1
hypothesis : 6.68.2
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : 3.0.8
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : 1.0.2
psycopg2 : 2.9.3
jinja2 : 3.1.2
IPython : 8.11.0
pandas_datareader: None
bs4 : 4.11.2
bottleneck : 1.3.6
brotli :
fastparquet : 2023.2.0
fsspec : 2023.1.0
gcsfs : 2023.1.0
matplotlib : 3.6.3
numba : 0.56.4
numexpr : 2.8.3
odfpy : None
openpyxl : 3.1.0
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : 1.2.1
pyxlsb : 1.0.10
s3fs : 2023.1.0
scipy : 1.10.1
snappy :
sqlalchemy : 2.0.4
tables : 3.7.0
tabulate : 0.9.0
xarray : 2023.1.0
xlrd : 2.0.1
xlwt : None
zstandard : 0.19.0
tzdata : None