Skip to content

TST: query with timezone aware index & column #34021

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 20, 2020

Conversation

vipulr8
Copy link
Contributor

@vipulr8 vipulr8 commented May 6, 2020

Comment on lines 24 to 53
"Africa/Abidjan",
"Africa/Douala",
"Africa/Mbabane",
"America/Argentina/Catamarca",
"America/Belize",
"America/Curacao",
"America/Guatemala",
"America/Kentucky/Louisville",
"America/Mexico_City",
"America/Port-au-Prince",
"America/Sitka",
"Antarctica/Casey",
"Asia/Ashkhabad",
"Asia/Dubai",
"Asia/Khandyga",
"Asia/Qatar",
"Asia/Tomsk",
"Atlantic/Reykjavik",
"Australia/Queensland",
"Canada/Yukon",
"Etc/GMT+7",
"Etc/UCT",
"Europe/Guernsey",
"Europe/Paris",
"Europe/Vienna",
"Indian/Cocos",
"NZ",
"Pacific/Honolulu",
"Pacific/Samoa",
"US/Eastern",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vipulrai91

There's a fixture you can use here, tz_aware_fixture (feel free to ask for help if it's unclear how to use it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli thank you for the feedback.

As far as I understand the change has to be.

def test_check_column_vs_index_tz_query(self, tz_aware_fixture):
    tz_list = tz_aware_fixture

Got this snippet from test_datetime.py , but what actually is tz_aware_fixture?
Also , How Do I run the test locally, I tried going through the docs , but will need some help here.

Thanks

Copy link
Member

@MarcoGorelli MarcoGorelli May 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tz = tz_aware_fixture should be enough, then pytest will loop through the different timezones defined in tz_aware_fixture

what actually is tz_aware_fixture?

It's a fixture which you can use to test different timezones.

How Do I run the test locally

See using pytest. So here,

$ pytest pandas/tests/frame/indexing/test_column_vs_index_tz.py

should be enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for guiding.

tz_aware_fixture is a combination of TimeZones and other object types , also even
it throws pytz.exceptions.UnknownTimeZone error.

test_column_vs_index_tz.py:58: TypeError
============================================================ short test summary info =============================================================
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query['UTC'] - pytz.exceptions.UnknownTimeZoneErr...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query['US/Eastern'] - pytz.exceptions.UnknownTime...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query['Asia/Tokyo'] - pytz.exceptions.UnknownTime...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query['dateutil/US/Pacific'] - pytz.exceptions.Un...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query['dateutil/Asia/Singapore'] - pytz.exception...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[tzutc()] - TypeError: 'tzutc' object is not...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[tzlocal()] - TypeError: 'tzlocal' object is...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[pytz.FixedOffset(300)] - TypeError: '_Fixed...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[<UTC>] - TypeError: 'UTC' object is not ite...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[pytz.FixedOffset(-300)] - TypeError: '_Fixe...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[datetime.timezone.utc] - TypeError: 'dateti...
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[datetime.timezone(datetime.timedelta(seconds=3600))]
FAILED test_column_vs_index_tz.py::TestColumnvsIndexTZEquality::test_check_column_vs_index_tz_query[datetime.timezone(datetime.timedelta(days=-1, seconds=82800), 'foo')]

import pandas._testing as tm


class TestColumnvsIndexTZEquality:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need an individual class for this. Writing the test as a function is sufficient.

Don't need a new file for just this test. This test can live in pandas/tests/frame/test_query_eval.py


class TestColumnvsIndexTZEquality:
# https://github.com/pandas-dev/pandas/issues/29463
def check_for_various_tz(self, tz):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned by @MarcoGorelli, you can just use the tz_aware_fixture here

# # This was earlier raising an exception.
index_query = df.set_index("time").query('"2019-01-03 00:00:00+00" < time')
l2 = pd.DataFrame(list(index_query.index))
tm.assert_frame_equal(l1, l2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename the variables so this reads tm.assert_frame_equal(result, expected)

}
)
df_query = df.query('"2019-01-03 00:00:00+00" < time')
l1 = pd.DataFrame(list(df_query["time"]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this is the "expected" output, we'll want to construct this with a different method without query

expected = pd.DataFrame(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke ,
I modified everything as per suggestion and the tests pass locally.
But here instead of constructing new "expected" DataFrame we would want to use the ones already generating by using query since "We are testing for result to be same at index and column level"
This is what was expected as per issue ? Right ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to make sure the "expected" DataFrame takes an independent path than query so if query breaks both "result" and "expected" don't change.

So want to make sure the result of:
result = df.set_index('time').query('"2019-01-03 00:00:00+00" < time')

gives a result of:
expected = pd.DataFrame(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we create an expected dataframe , it passes for few tz and fails for others , bcoz the query string always expects the time zone to be "UTC" , query('"2019-01-03 00:00:00+00" < time'), and therefore there is shape mismatch for example for "Asia/Kolkata" it is failing whereas it passes for "US/Eastern".

I assume we cannot change the query string because that is what we are testing for , in that case knowing the expected o/p when comparing to different tz will result in shape mismatch which is exactly happening now.
Any thoughts ?

def test_check_column_vs_index_tz_query(self, tz_aware_fixture):
    # https://github.com/pandas-dev/pandas/issues/29463

    tz = tz_aware_fixture
    expected = pd.DataFrame(
        {"time": pd.date_range(start="2019-01-04", freq="1d", periods=7, tz=tz)}
    ).set_index("time")

    df = pd.DataFrame(
        {"time": pd.date_range(start="2019-01-01", freq="1d", periods=10, tz=tz)}
    )
    result = df.set_index("time").query('"2019-01-03 00:00:00+00" < time')
    tm.assert_frame_equal(result, expected)
AssertionError: DataFrame are different

DataFrame shape mismatch
[left]:  (7, 0)
[right]: (8, 0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're just testing that we can query with a tz offset in the query string when the index is tz-aware, we can be with what the query string is.

You can just change the test to be then

    tz = tz_aware_fixture
    expected = pd.DataFrame(index=pd.date_range(start="2019-01-04", freq="1d", periods=10, tz=tz, name='time'))

    df = pd.DataFrame(index=pd.date_range(start="2019-01-01", freq="1d", periods=10, tz=tz, name=time))
    result = df.query('"2018-01-03 00:00:00+00" < time')
    tm.assert_frame_equal(result, expected)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke , Yes this works , I am now banging my head for not thinking this earlier.
Thank you very much , now all the tests pass.

As suggested , I have added the test in test_query_eval.py file.

@simonjayhawkins simonjayhawkins added the Testing pandas testing functions or related to the test suite label May 6, 2020
@simonjayhawkins simonjayhawkins changed the title Issue 29463 add test to close Issue 29463 May 6, 2020
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one @vipulrai91 , looks better like this, just got a comment about the dates

start="2019-01-01", freq="1d", periods=10, tz=tz, name="time"
)
)
result = df.query('"2018-01-03 00:00:00+00" < time')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this, time is always greater than 2018-01-03 00:00:00+00 - I think @mroeschke was suggesting to use, say, 2019-01-04 in expected, 2019-01-01 in df, and then 2019-01-03 in query

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli Thank you for the comment / appreciation.

If we use 2019-01-03 in query it results in earlier problem , because the shape of "expected" and "df" remains the same also their tz is changing , but in query the tz remains constant and when we query it to diff timezone their shape becomes inconsistent as the query tz is always constant. (So it always result in +1 or -1 records and hence fails.)

Hence I think he suggested to use 2018-01-03 , as it will always result in consistent shape while checking tz-aware for all time zones.

Screenshot 2020-05-07 at 02 01 14

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry I'd misunderstood, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, it seems that your index for both df and expected are identical - if so, can you make a variable to share them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ! Thanks.

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. If you could merge master that will fix the failing CI

@vipulr8
Copy link
Contributor Author

vipulr8 commented May 7, 2020

LGTM. If you could merge master that will fix the failing CI

Thanks. Merging now.

updating ci/cd from master
@vipulr8
Copy link
Contributor Author

vipulr8 commented May 7, 2020

LGTM. If you could merge master that will fix the failing CI

Thanks. Merging now.

@mroeschke All checks passes.

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mroeschke mroeschke added this to the 1.1 milestone May 7, 2020
@jreback jreback added the Timezones Timezone data dtype label May 7, 2020
df = pd.DataFrame(index=df_index)
result = df.query('"2018-01-03 00:00:00+00" < time')
tm.assert_frame_equal(result, expected)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also test the working case here, e.g. result = df.reset_index().query(....)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Done. Please review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vipinkjonwal thanks for your updates

Like this, what's the difference between df2.set_index("time") and df?

I think Jeff meant

result = df.reset_index().query('"2018-01-03 00:00:00+00" < time')
expected = pd.DataFrame(df_index)
tm.assert_frame_equal(result, expected)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli , I thought this was already working when the related issue was raised , I've updated the test with suggested changes.

@jreback jreback changed the title add test to close Issue 29463 TST: query with timezone aware index & column May 7, 2020
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this PR hasn't been merged yet. Mind merging master into this branch?

After that, we can probably merge this in

@vipulr8
Copy link
Contributor Author

vipulr8 commented May 20, 2020

Sorry this PR hasn't been merged yet. Mind merging master into this branch?

After that, we can probably merge this in

@mroeschke done.

@mroeschke mroeschke merged commit 45fee32 into pandas-dev:master May 20, 2020
@mroeschke
Copy link
Member

Thanks for your patience @vipulrai91. Nice contribution!

@vipulr8
Copy link
Contributor Author

vipulr8 commented May 20, 2020

Thanks for your patience @vipulrai91. Nice contribution!

Thank you @mroeschke @MarcoGorelli @jreback for the help.
Looking forward to more contributions.

@vipulr8 vipulr8 deleted the issue-29463 branch May 20, 2020 05:47
PuneethaPai pushed a commit to PuneethaPai/pandas that referenced this pull request May 22, 2020
* testing issue-29463

* testing issue-29463

* adding test to close issue 29463

* making index as variable - 29463

* adding tz_aware_index_test_2

* adding tz_aware_index_test_3

Co-authored-by: Vipul Rai <“vipulrai8891@gmail.com”>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Cannot compare tz-naive and tz-aware datetime-like objects" when querying DatetimeIndex
5 participants