-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
TST: query with timezone aware index & column #34021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
"Africa/Abidjan", | ||
"Africa/Douala", | ||
"Africa/Mbabane", | ||
"America/Argentina/Catamarca", | ||
"America/Belize", | ||
"America/Curacao", | ||
"America/Guatemala", | ||
"America/Kentucky/Louisville", | ||
"America/Mexico_City", | ||
"America/Port-au-Prince", | ||
"America/Sitka", | ||
"Antarctica/Casey", | ||
"Asia/Ashkhabad", | ||
"Asia/Dubai", | ||
"Asia/Khandyga", | ||
"Asia/Qatar", | ||
"Asia/Tomsk", | ||
"Atlantic/Reykjavik", | ||
"Australia/Queensland", | ||
"Canada/Yukon", | ||
"Etc/GMT+7", | ||
"Etc/UCT", | ||
"Europe/Guernsey", | ||
"Europe/Paris", | ||
"Europe/Vienna", | ||
"Indian/Cocos", | ||
"NZ", | ||
"Pacific/Honolulu", | ||
"Pacific/Samoa", | ||
"US/Eastern", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vipulrai91
There's a fixture you can use here, tz_aware_fixture
(feel free to ask for help if it's unclear how to use it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli thank you for the feedback.
As far as I understand the change has to be.
def test_check_column_vs_index_tz_query(self, tz_aware_fixture):
tz_list = tz_aware_fixture
Got this snippet from test_datetime.py , but what actually is tz_aware_fixture?
Also , How Do I run the test locally, I tried going through the docs , but will need some help here.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tz = tz_aware_fixture
should be enough, then pytest will loop through the different timezones defined in tz_aware_fixture
what actually is tz_aware_fixture?
It's a fixture which you can use to test different timezones.
How Do I run the test locally
See using pytest. So here,
$ pytest pandas/tests/frame/indexing/test_column_vs_index_tz.py
should be enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for guiding.
tz_aware_fixture is a combination of TimeZones and other object types , also even
it throws pytz.exceptions.UnknownTimeZone error.
test_column_vs_index_tz.py:58: TypeError
============================================================ short test summary info =============================================================
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query['UTC'] - pytz.exceptions.UnknownTimeZoneErr...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query['US/Eastern'] - pytz.exceptions.UnknownTime...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query['Asia/Tokyo'] - pytz.exceptions.UnknownTime...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query['dateutil/US/Pacific'] - pytz.exceptions.Un...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query['dateutil/Asia/Singapore'] - pytz.exception...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[tzutc()] - TypeError: 'tzutc' object is not...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[tzlocal()] - TypeError: 'tzlocal' object is...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[pytz.FixedOffset(300)] - TypeError: '_Fixed...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[<UTC>] - TypeError: 'UTC' object is not ite...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[pytz.FixedOffset(-300)] - TypeError: '_Fixe...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[datetime.timezone.utc] - TypeError: 'dateti...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[datetime.timezone(datetime.timedelta(seconds=3600))]
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[datetime.timezone(datetime.timedelta(days=-1, seconds=82800), 'foo')]
import pandas._testing as tm | ||
|
||
|
||
class TestColumnvsIndexTZEquality: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need an individual class for this. Writing the test as a function is sufficient.
Don't need a new file for just this test. This test can live in pandas/tests/frame/test_query_eval.py
|
||
class TestColumnvsIndexTZEquality: | ||
# https://github.com/pandas-dev/pandas/issues/29463 | ||
def check_for_various_tz(self, tz): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned by @MarcoGorelli, you can just use the tz_aware_fixture
here
# # This was earlier raising an exception. | ||
index_query = df.set_index("time").query('"2019-01-03 00:00:00+00" < time') | ||
l2 = pd.DataFrame(list(index_query.index)) | ||
tm.assert_frame_equal(l1, l2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename the variables so this reads tm.assert_frame_equal(result, expected)
} | ||
) | ||
df_query = df.query('"2019-01-03 00:00:00+00" < time') | ||
l1 = pd.DataFrame(list(df_query["time"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this is the "expected" output, we'll want to construct this with a different method without query
expected = pd.DataFrame(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke ,
I modified everything as per suggestion and the tests pass locally.
But here instead of constructing new "expected" DataFrame we would want to use the ones already generating by using query since "We are testing for result to be same at index and column level"
This is what was expected as per issue ? Right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to make sure the "expected" DataFrame takes an independent path than query so if query
breaks both "result" and "expected" don't change.
So want to make sure the result of:
result = df.set_index('time').query('"2019-01-03 00:00:00+00" < time')
gives a result of:
expected = pd.DataFrame(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we create an expected dataframe , it passes for few tz and fails for others , bcoz the query string always expects the time zone to be "UTC" , query('"2019-01-03 00:00:00+00" < time'), and therefore there is shape mismatch for example for "Asia/Kolkata" it is failing whereas it passes for "US/Eastern".
I assume we cannot change the query string because that is what we are testing for , in that case knowing the expected o/p when comparing to different tz will result in shape mismatch which is exactly happening now.
Any thoughts ?
def test_check_column_vs_index_tz_query(self, tz_aware_fixture):
# https://github.com/pandas-dev/pandas/issues/29463
tz = tz_aware_fixture
expected = pd.DataFrame(
{"time": pd.date_range(start="2019-01-04", freq="1d", periods=7, tz=tz)}
).set_index("time")
df = pd.DataFrame(
{"time": pd.date_range(start="2019-01-01", freq="1d", periods=10, tz=tz)}
)
result = df.set_index("time").query('"2019-01-03 00:00:00+00" < time')
tm.assert_frame_equal(result, expected)
AssertionError: DataFrame are different
DataFrame shape mismatch
[left]: (7, 0)
[right]: (8, 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're just testing that we can query with a tz offset in the query string when the index is tz-aware, we can be with what the query string is.
You can just change the test to be then
tz = tz_aware_fixture
expected = pd.DataFrame(index=pd.date_range(start="2019-01-04", freq="1d", periods=10, tz=tz, name='time'))
df = pd.DataFrame(index=pd.date_range(start="2019-01-01", freq="1d", periods=10, tz=tz, name=time))
result = df.query('"2018-01-03 00:00:00+00" < time')
tm.assert_frame_equal(result, expected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke , Yes this works , I am now banging my head for not thinking this earlier.
Thank you very much , now all the tests pass.
As suggested , I have added the test in test_query_eval.py file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one @vipulrai91 , looks better like this, just got a comment about the dates
start="2019-01-01", freq="1d", periods=10, tz=tz, name="time" | ||
) | ||
) | ||
result = df.query('"2018-01-03 00:00:00+00" < time') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this, time
is always greater than 2018-01-03 00:00:00+00
- I think @mroeschke was suggesting to use, say, 2019-01-04
in expected
, 2019-01-01
in df
, and then 2019-01-03
in query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli Thank you for the comment / appreciation.
If we use 2019-01-03 in query it results in earlier problem , because the shape of "expected" and "df" remains the same also their tz is changing , but in query the tz remains constant and when we query it to diff timezone their shape becomes inconsistent as the query tz is always constant. (So it always result in +1 or -1 records and hence fails.)
Hence I think he suggested to use 2018-01-03 , as it will always result in consistent shape while checking tz-aware for all time zones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry I'd misunderstood, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, it seems that your index for both df
and expected
are identical - if so, can you make a variable to share them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ! Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. If you could merge master that will fix the failing CI
Thanks. Merging now. |
updating ci/cd from master
@mroeschke All checks passes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
df = pd.DataFrame(index=df_index) | ||
result = df.query('"2018-01-03 00:00:00+00" < time') | ||
tm.assert_frame_equal(result, expected) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also test the working case here, e.g. result = df.reset_index().query(....)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback Done. Please review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vipinkjonwal thanks for your updates
Like this, what's the difference between df2.set_index("time")
and df
?
I think Jeff meant
result = df.reset_index().query('"2018-01-03 00:00:00+00" < time')
expected = pd.DataFrame(df_index)
tm.assert_frame_equal(result, expected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli , I thought this was already working when the related issue was raised , I've updated the test with suggested changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this PR hasn't been merged yet. Mind merging master into this branch?
After that, we can probably merge this in
@mroeschke done. |
Thanks for your patience @vipulrai91. Nice contribution! |
Thank you @mroeschke @MarcoGorelli @jreback for the help. |
* testing issue-29463 * testing issue-29463 * adding test to close issue 29463 * making index as variable - 29463 * adding tz_aware_index_test_2 * adding tz_aware_index_test_3 Co-authored-by: Vipul Rai <“vipulrai8891@gmail.com”>
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff