-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: Series constructor drops nanoseconds of Timedelta scalar #38040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(#GH38032) use convert_scalar_for_putitemlike for better conversion
use cast_scalar_to_array as base for ndarray creation
Take care of non np.dtype args
cast_scalar_to_array: more testing + cast shape to tuple where needed
Well, actually, do this changes in cast.py make sense? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a specific test that hits the OP: #38032, also needs a whatsnew note (1.2) if you can do soonish.
pandas/core/dtypes/cast.py
Outdated
value = ensure_str(value) | ||
elif dtype.kind in ["M", "m"]: | ||
# GH38032: filling in Timedelta/Timestamp drops nanoseconds | ||
if isinstance(value, (Timedelta, Timestamp)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the change yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be lossy for tzaware
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you both for your comments. Yes this is the change.
At first I went to update construct_1d_arraylike_from_scalar
, but tests showed more was broken.
cast_scalar_to_array
is used in DF constructor and but did not check its dtype arg like construct_1d_arraylike_from_scalar
, hence e.g.
In [2]: pd.DataFrame("hello", index=[0], columns=[0], dtype="U")
Out[2]:
0
0 h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to isolate the part taking care of converting a given (scalar, np.dtype) into smth suitable for creating a ndarray filled with that scalar, and cast_scalar_to_array
felt like a good place.
def test_cast_scalar_to_array_conversion_needed( | ||
obj_in, dtype_in, obj_out, dtype_out, shape | ||
): | ||
tm.assert_numpy_array_equal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you write this as
result=
expected=
tm.assert_....
"obj_in,dtype_in,obj_out,dtype_out", | ||
[ | ||
(NaT, "datetime64[ns]", np.datetime64("NaT"), "datetime64[ns]"), | ||
(Timestamp(1), "datetime64[ns]", 1, "datetime64[ns]"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a tz-aware Timestamp case
can you add Period, Interval as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can probably use this fixture: ea_scalar_and_dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be sure: as we cast to np.dtype, Period and Interval should be left aside, right?
@pytest.mark.parametrize( | ||
"obj_in,dtype_in,obj_out,dtype_out", | ||
[ | ||
(NaT, "datetime64[ns]", np.datetime64("NaT"), "datetime64[ns]"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a Nat that is timedelta64[ns]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
else: | ||
fill_value = value | ||
if not isinstance(dtype, np.dtype): | ||
dtype = np.dtype(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there no risk of getting here with an EADtype?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it is, no, cast_scalar_to_array
is called after having checked dtype is not EADtype. It must have been written with that in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an assert isinstance(dtype, (type(None), str, np.dtype)) and a comment (that's what the type annotation indicates) (at the top of the function)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/core/dtypes/cast.py
Outdated
if not isinstance(dtype, np.dtype): | ||
dtype = np.dtype(dtype) | ||
empty = shape and not any(shape) | ||
# rem: type coercion if empty: sometimes yes, sometimes no ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is "rem"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, just remark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but yes, sorry, this is a real question: in the case of empty DF/Series init, is this non-uniformity in dtype casting wanted ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use TODO (or don't use rem its not common nomeclature)
else: | ||
fill_value = value | ||
if not isinstance(dtype, np.dtype): | ||
dtype = np.dtype(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an assert isinstance(dtype, (type(None), str, np.dtype)) and a comment (that's what the type annotation indicates) (at the top of the function)
pandas/core/dtypes/cast.py
Outdated
if not isinstance(dtype, np.dtype): | ||
dtype = np.dtype(dtype) | ||
empty = shape and not any(shape) | ||
# rem: type coercion if empty: sometimes yes, sometimes no ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use TODO (or don't use rem its not common nomeclature)
dtype = np.dtype("object") | ||
if not isna(value): | ||
value = ensure_str(value) | ||
elif dtype.kind == "m": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could share the "m" and "M" code and for the is_valid_nat_for_dtype case use value = dtype.type("NaT", "ns")
(Timedelta(1), "timedelta64[ns]", 1, "timedelta64[ns]"), | ||
(NaT, "datetime64[ns]", np.datetime64("NaT"), "datetime64[ns]"), | ||
(Timestamp(1), "datetime64[ns]", 1, "datetime64[ns]"), | ||
(Timestamp(1, tz="US/Eastern"), "datetime64[ns]", 1, "datetime64[ns]"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a deeply weird behavior that we should be trying to deprecate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, true, it can be counterintuitive that pd.Timestamp("1970-01-01", tz="US/Pacific") != pd.Timestamp(0, tz="US/Pacific")
[ | ||
(Timedelta(1), "timedelta64[ns]"), | ||
(Timestamp(1), "datetime64[ns]"), | ||
(Timestamp(1, tz="US/Eastern"), "datetime64[ns]"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same weird behavior
is #38405 doing the same thing as this? |
Yes, I'll close. |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff