Skip to content

BUG: replacing out of bound datetimes is not possible #36782

Closed
@krassowski

Description

@krassowski
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

from datetime import datetime
from pandas import DataFrame

# typo in the data entry, should have been 2020!
df = DataFrame({'x': [datetime(2920, 10, 1)]})
# let's try to fix that in code:
df.x.replace({datetime(2920, 10, 1): datetime(2020, 10, 1)})
# or
df.replace({datetime(2920, 10, 1): datetime(2020, 10, 1)})

Raises:

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2920-10-01 00:00:00
Full traceback details
OutOfBoundsDatetime                       Traceback (most recent call last)
<ipython-input-1-a4bdb67c8945> in <module>
      5 df = DataFrame({'x': [datetime(2920, 10, 1)]})
      6 # let's try to fix that in code:
----> 7 df.x.replace({datetime(2920, 10, 1): datetime(2020, 10, 1)})

/site-packages/pandas/core/series.py in replace(self, to_replace, value, inplace, limit, regex, method)
   4561         method="pad",
   4562     ):
-> 4563         return super().replace(
   4564             to_replace=to_replace,
   4565             value=value,

/site-packages/pandas/core/generic.py in replace(self, to_replace, value, inplace, limit, regex, method)
   6495                 to_replace, value = keys, values
   6496 
-> 6497             return self.replace(
   6498                 to_replace, value, inplace=inplace, limit=limit, regex=regex
   6499             )

/site-packages/pandas/core/series.py in replace(self, to_replace, value, inplace, limit, regex, method)
   4561         method="pad",
   4562     ):
-> 4563         return super().replace(
   4564             to_replace=to_replace,
   4565             value=value,

/site-packages/pandas/core/generic.py in replace(self, to_replace, value, inplace, limit, regex, method)
   6538                         )
   6539                     self._consolidate_inplace()
-> 6540                     new_data = self._mgr.replace_list(
   6541                         src_list=to_replace,
   6542                         dest_list=value,

/site-packages/pandas/core/internals/managers.py in replace_list(self, src_list, dest_list, inplace, regex)
    640         mask = ~isna(values)
    641 
--> 642         masks = [comp(s, mask, regex) for s in src_list]
    643 
    644         result_blocks = []

/site-packages/pandas/core/internals/managers.py in <listcomp>(.0)
    640         mask = ~isna(values)
    641 
--> 642         masks = [comp(s, mask, regex) for s in src_list]
    643 
    644         result_blocks = []

/site-packages/pandas/core/internals/managers.py in comp(s, mask, regex)
    633                 return ~mask
    634 
--> 635             s = com.maybe_box_datetimelike(s)
    636             return _compare_or_regex_search(values, s, regex, mask)
    637 

/site-packages/pandas/core/common.py in maybe_box_datetimelike(value, dtype)
     88 
     89     if isinstance(value, (np.datetime64, datetime)):
---> 90         value = tslibs.Timestamp(value)
     91     elif isinstance(value, (np.timedelta64, timedelta)):
     92         value = tslibs.Timedelta(value)

pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_datetime_to_tsobject()

pandas/_libs/tslibs/np_datetime.pyx in pandas._libs.tslibs.np_datetime.check_dts_bounds()

Problem description

It is useful to be able to replace the dates which are out of bound, because these are not supported by pandas. However, it is currently difficult because replace no longer works for them. This is a regression as the above code worked well in pandas 1.0.4, but does not work in pandas 1.1.2 nor on master.

Expected Output

df should be equal to DataFrame({'x': [datetime(2020, 10, 1)]})

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 2a7d332
python : 3.8.1.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-48-generic
Version : #52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.1.2
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.2.3
setuptools : 41.2.0
Cython : None
pytest : 5.3.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.1.2
numexpr : None
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.49.0

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions