Skip to content

BUG: SAS datetime column with null values cannot be parsed. #39725

Closed
@wertha

Description

@wertha
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample

import pandas as pd
pd.read_sas('dates_null.sas7bdat')

Problem description

When a column datetime contains null values it cannot be converted to a dataframe

Traceback

Traceback (most recent call last):
  File "/home/wertha/source/pandas/pandas/tests/io/sas/data/.test/lib/python3.9/site-packages/pandas/io/sas/sas7bdat.py", line 52, in _convert_datetimes
    return pd.to_datetime(sas_datetimes, unit=unit, origin="1960-01-01")
  File "/home/wertha/source/pandas/pandas/tests/io/sas/data/.test/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 805, in to_datetime
    values = convert_listlike(arg._values, format)
  File "/home/wertha/source/pandas/pandas/tests/io/sas/data/.test/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 345, in _convert_listlike_datetimes
    result, tz_parsed = tslib.array_with_unit_to_datetime(
  File "pandas/_libs/tslib.pyx", line 249, in pandas._libs.tslib.array_with_unit_to_datetime
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: cannot convert input with unit 'd'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/wertha/source/pandas/pandas/tests/io/sas/data/.test/lib/python3.9/site-packages/pandas/io/sas/sasreader.py", line 152, in read_sas
    return reader.read()
  File "/home/wertha/source/pandas/pandas/tests/io/sas/data/.test/lib/python3.9/site-packages/pandas/io/sas/sas7bdat.py", line 723, in read
    rslt = self._chunk_to_dataframe()
  File "/home/wertha/source/pandas/pandas/tests/io/sas/data/.test/lib/python3.9/site-packages/pandas/io/sas/sas7bdat.py", line 771, in _chunk_to_dataframe
    rslt[name] = _convert_datetimes(rslt[name], "d")
  File "/home/wertha/source/pandas/pandas/tests/io/sas/data/.test/lib/python3.9/site-packages/pandas/io/sas/sas7bdat.py", line 59, in _convert_datetimes
    return sas_datetimes.apply(
  File "/home/wertha/source/pandas/pandas/tests/io/sas/data/.test/lib/python3.9/site-packages/pandas/core/series.py", line 4135, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/lib.pyx", line 2467, in pandas._libs.lib.map_infer
  File "/home/wertha/source/pandas/pandas/tests/io/sas/data/.test/lib/python3.9/site-packages/pandas/io/sas/sas7bdat.py", line 60, in <lambda>
    lambda sas_float: datetime(1960, 1, 1) + timedelta(days=sas_float)
ValueError: cannot convert float NaN to integer

Expected Output

A dataframe with NaT when nulls are found.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 7d32926
python : 3.9.1.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.14-arch1-1
Version : #1 SMP PREEMPT Sun, 07 Feb 2021 22:42:17 +0000
machine : x86_64
processor :
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.2
numpy : 1.20.1
pytz : 2021.1
dateutil : 2.8.1
pip : 20.2.3
setuptools : 49.2.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions