Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
(Hypothesis) Test name: test_on_offset_implementations
Test location: pandas/tests/tseries/offsets/test_offsets_properties.py
The following pair of test inputs at some point came up as failed for the hypothesis test in the checks for #41488, with pytz.AmbiguousTimeError
.
dt = datetime.datetime(1900, 1, 1, 0, 0, tzinfo=pytz.timezone('Africa/Kinshasa'))
offset = pd.offsets.MonthBegin(66)
I didn't get the blob for @reproduce_failure
, but we can similarly reproduce with @example
as follows. Please let me know if there is less messy, more copy-and-paste-able way to reproduce the test failure.
from hypothesis import example
import datetime
dt = datetime.datetime(1900, 1, 1, 0, 0, tzinfo=pytz.timezone('Africa/Kinshasa'))
offset = MonthBegin(66)
@example(dt, offset)
# ------ prepend the code snippet above to reproduce test failure ------
@pytest.mark.arm_slow
@given(gen_random_datetime, gen_yqm_offset)
def test_on_offset_implementations(dt, offset):
assume(not offset.normalize)
# check that the class-specific implementations of is_on_offset match
# the general case definition:
# (dt + offset) - offset == dt
try:
compare = (dt + offset) - offset
except pytz.NonExistentTimeError:
# dt + offset does not exist, assume(False) to indicate
# to hypothesis that this is not a valid test case
assume(False)
assert offset.is_on_offset(dt) == (compare == dt)
Expected Output
dt + offset
yields pd.Timestamp('1905-07-01', tzinfo=pytz.timezone('Africa/Kinshasa'))
, which I think is correctly an DST-ambiguous timestamp given the following pytz
specifications. So we should probably catch the exception and directly assume(False).
In[16]: import pytz
...: print(pytz.timezone('Africa/Kinshasa')._utc_transition_times[1])
...: print(pytz.timezone('Africa/Kinshasa')._transition_info[1])
...:
1905-06-30 23:46:25
(datetime.timedelta(0), datetime.timedelta(0), 'GMT')
Actual Output
============================================== FAILURES ==============================================
___________________________________ test_on_offset_implementations ___________________________________
@example(dt, offset)
> # ------ prepend code snippet to reproduce test failure ------
@pytest.mark.arm_slow
@given(gen_random_datetime, gen_yqm_offset)
pandas/tests/tseries/offsets/test_offsets_properties.py:99:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/tests/tseries/offsets/test_offsets_properties.py:108: in test_on_offset_implementations
compare = (dt + offset) - offset
pandas/_libs/tslibs/offsets.pyx:430: in pandas._libs.tslibs.offsets.BaseOffset.__add__
return other.__add__(self)
pandas/_libs/tslibs/offsets.pyx:432: in pandas._libs.tslibs.offsets.BaseOffset.__add__
return self.apply(other)
pandas/_libs/tslibs/offsets.pyx:178: in pandas._libs.tslibs.offsets.apply_wraps.wrapper
result = result.tz_localize(tz)
pandas/_libs/tslibs/timestamps.pyx:1705: in pandas._libs.tslibs.timestamps.Timestamp.tz_localize
value = tz_localize_to_utc_single(self.value, tz,
pandas/_libs/tslibs/tzconversion.pyx:74: in pandas._libs.tslibs.tzconversion.tz_localize_to_utc_single
return tz_localize_to_utc(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> raise pytz.AmbiguousTimeError(
E pytz.exceptions.AmbiguousTimeError: Cannot infer dst time from 1905-07-01 00:00:00, try using the 'ambiguous' argument
pandas/_libs/tslibs/tzconversion.pyx:284: AmbiguousTimeError
--------------------------------------------- Hypothesis ---------------------------------------------
Falsifying explicit example: test_on_offset_implementations(
dt=datetime.datetime(1900, 1, 1, 0, 0, tzinfo=<DstTzInfo 'Africa/Kinshasa' LMT+0:14:00 STD>),
offset=<66 * MonthBegins>,
)
Problem description
See above.
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit : 4d1019487a73d3e1791e1bace774522307ec2bbc
python : 3.8.8.final.0
python-bits : 64
OS : Linux
OS-release : 5.8.0-55-generic
Version : #62~20.04.1-Ubuntu SMP Wed Jun 2 08:55:04 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.0.dev0+1850.g4d1019487a.dirty
numpy : 1.20.1
pytz : 2021.1
dateutil : 2.8.1
pip : 21.0.1
setuptools : 49.6.0.post20210108
Cython : 0.29.22
pytest : 6.2.2
hypothesis : 6.8.1
sphinx : 3.5.3
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.21.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.7
fastparquet : 0.5.0
gcsfs : 0.7.2
matplotlib : 3.3.4
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : 0.5.2
scipy : 1.6.1
sqlalchemy : 1.4.2
tables : 3.6.1
tabulate : 0.8.9
xarray : 0.17.0
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.53.0