-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
TST: Added test case for DataFrame.at #33577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @EdAbati lgtm
# https://github.com/pandas-dev/pandas/issues/33544 | ||
df = pd.DataFrame({"foo": [datetime(2000, 1, 1)]}) | ||
df.at[0, "foo"] = datetime(2000, 1, 2, tzinfo=timezone.utc) | ||
assert df.at[0, "foo"].tzinfo == timezone.utc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you instead create an expected
Dataframe and use tm.assert_frame_equal
?
result = pd.DataFrame({"foo": [datetime(2000, 1, 1)]})
result.at[0, "foo"] = datetime(2000, 1, 2, tzinfo=timezone.utc)
expected = pd.DataFrame(....)
tm.assert_frame_equal(result, expected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mroeschke ,
I have implemented as you suggested, but actually (as you can see in the checks below) this makes the test fail and raises an interesting question.
When creating the dataframe with:
result = pd.DataFrame({"foo": [datetime(2000, 1, 1)]})
It transforms the value result.at[0, "foo"]
into a Timestamp object:Out[4]: Timestamp('2000-01-01 00:00:00')
and running result.info()
will show that the column Dtype is datetime64[ns]
.
If now I apply the following
result.at[0, "foo"] = datetime(2000, 1, 2, tzinfo=timezone.utc)
the result.at[0, "foo"]
is transformed in to a datetime object
Out[6]: datetime.datetime(2000, 1, 2, 0, 0, tzinfo=datetime.timezone.utc)
and also the column Dtype is changed to object
.
I am not sure if this is the expected behaviour or it is a bug.
If this is the expected behaviour I should change the test, otherwise .at
needs to be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the expected behavior as opposed to the loc
variant that would coerce to datetime type.
You can therefore change the expected construction to
expected = pd.DataFrame(np.array([datetime(2000, 1, 2, tzinfo=timezone.utc)], dtype=object, columns['foo'])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the expected behavior
this seems inconsisent with other types that cannot hold the scalar type.
>>> import numpy as np
>>>
>>> import pandas as pd
>>>
>>> pd.__version__
'1.1.0.dev0+1290.g7e99bb433'
>>>
>>> df = pd.DataFrame({"a": [1, 2, 3]})
>>> df
a
0 1
1 2
2 3
>>>
>>> df.at[0, "a"] = np.nan
Traceback (most recent call last):
...
ValueError: Cannot assign nan to integer series
>>>
@simonjayhawkins brings up a good point. This is possibly not fully fixed as the expected dtype should not be object but Interested in fixing this bug @EdAbati? |
Hi @mroeschke and @simonjayhawkins , I could try to to give it a go, but I am not super familiar with every internal class and I may need some guidance/a little bit more time. I have tried to debug the The item is set using What do you think, would changing |
I think that behavior is working as expected. The odd part is that essentially the entire column is being changed, so I am not sure why we need to do this data type check. |
thanks, |
my take-away for this is that the behaviour reported in original issue was perhaps the correct behaviour. The loss of the timezone a result of the scalar being coerced. This is similar to #22740. Agreed in both cases the user 'incorrectly' expected a different behaviour, because the scalar could be coerced.
The 'fixed' issue now coerces the entire column from |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff