Description
Code Sample, a copy-pastable example if possible
import pandas as pd
from pandas.core.internals.construction import _try_cast
arr = np.arange(0, 10, dtype = "int64")
# should return early
_try_cast(arr, None, False, False)
Problem description
During series construction, a function, sanitize_array
attempts to use _try_cast
, to cast the input to a better type. _try_cast
is fairly slow to run, so it tries to avoid casting in common cases. However, due to a missing not
keyword, it appears _try_cast
runs for the cases it wants to avoid (like the one above).
Here are the relevant lines of _try_catch
:
https://github.com/pandas-dev/pandas/blob/master/pandas/core/construction.py#L511-L513
Should this be not maybe_castable(arr)
?
It is very surprising that lines like this would intentially create an array, and then try to cast it, even when the dtype option passed is None.
https://github.com/pandas-dev/pandas/blob/master/pandas/core/construction.py#L429-L432
Expected Output
_try_cast
not run during sanitize_array
for common types (e.g. int64). However, from looking at it with %%prune
, and running pdb, I can see functions like maybe_cast_to_datetime
are called.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.6.7.final.0
python-bits : 64
OS : Darwin
OS-release : 17.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.25.1
numpy : 1.16.1
pytz : 2018.9
dateutil : 2.8.0
pip : 19.1.1
setuptools : 39.0.1
Cython : None
pytest : 4.4.2
hypothesis : None
sphinx : 2.0.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : 0.9.3
psycopg2 : 2.8.2 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.5.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.2.1
sqlalchemy : 1.3.4
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None