Skip to content

BUG: Series constructor drops nanoseconds of Timedelta scalar #38040

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from

Conversation

ma3da
Copy link
Contributor

@ma3da ma3da commented Nov 24, 2020

@ma3da ma3da marked this pull request as ready for review November 26, 2020 06:24
@ma3da
Copy link
Contributor Author

ma3da commented Nov 26, 2020

Well, actually, do this changes in cast.py make sense?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a specific test that hits the OP: #38032, also needs a whatsnew note (1.2) if you can do soonish.

value = ensure_str(value)
elif dtype.kind in ["M", "m"]:
# GH38032: filling in Timedelta/Timestamp drops nanoseconds
if isinstance(value, (Timedelta, Timestamp)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the change yes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be lossy for tzaware

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both for your comments. Yes this is the change.

At first I went to update construct_1d_arraylike_from_scalar, but tests showed more was broken.

cast_scalar_to_array is used in DF constructor and but did not check its dtype arg like construct_1d_arraylike_from_scalar, hence e.g.

In [2]: pd.DataFrame("hello", index=[0], columns=[0], dtype="U")                                                                                                                                                                           
Out[2]: 
   0
0  h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to isolate the part taking care of converting a given (scalar, np.dtype) into smth suitable for creating a ndarray filled with that scalar, and cast_scalar_to_array felt like a good place.

def test_cast_scalar_to_array_conversion_needed(
obj_in, dtype_in, obj_out, dtype_out, shape
):
tm.assert_numpy_array_equal(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you write this as

result=
expected=
tm.assert_....

"obj_in,dtype_in,obj_out,dtype_out",
[
(NaT, "datetime64[ns]", np.datetime64("NaT"), "datetime64[ns]"),
(Timestamp(1), "datetime64[ns]", 1, "datetime64[ns]"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a tz-aware Timestamp case
can you add Period, Interval as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can probably use this fixture: ea_scalar_and_dtype

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be sure: as we cast to np.dtype, Period and Interval should be left aside, right?

@pytest.mark.parametrize(
"obj_in,dtype_in,obj_out,dtype_out",
[
(NaT, "datetime64[ns]", np.datetime64("NaT"), "datetime64[ns]"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add a Nat that is timedelta64[ns]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jreback jreback added Bug Timedelta Timedelta data type Constructors Series/DataFrame/Index/pd.array Constructors labels Nov 26, 2020
@jreback
Copy link
Contributor

jreback commented Nov 26, 2020

cc @jbrockmendel

else:
fill_value = value
if not isinstance(dtype, np.dtype):
dtype = np.dtype(dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there no risk of getting here with an EADtype?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is, no, cast_scalar_to_array is called after having checked dtype is not EADtype. It must have been written with that in mind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an assert isinstance(dtype, (type(None), str, np.dtype)) and a comment (that's what the type annotation indicates) (at the top of the function)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if not isinstance(dtype, np.dtype):
dtype = np.dtype(dtype)
empty = shape and not any(shape)
# rem: type coercion if empty: sometimes yes, sometimes no ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is "rem"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, just remark

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but yes, sorry, this is a real question: in the case of empty DF/Series init, is this non-uniformity in dtype casting wanted ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use TODO (or don't use rem its not common nomeclature)

else:
fill_value = value
if not isinstance(dtype, np.dtype):
dtype = np.dtype(dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an assert isinstance(dtype, (type(None), str, np.dtype)) and a comment (that's what the type annotation indicates) (at the top of the function)

if not isinstance(dtype, np.dtype):
dtype = np.dtype(dtype)
empty = shape and not any(shape)
# rem: type coercion if empty: sometimes yes, sometimes no ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use TODO (or don't use rem its not common nomeclature)

dtype = np.dtype("object")
if not isna(value):
value = ensure_str(value)
elif dtype.kind == "m":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could share the "m" and "M" code and for the is_valid_nat_for_dtype case use value = dtype.type("NaT", "ns")

(Timedelta(1), "timedelta64[ns]", 1, "timedelta64[ns]"),
(NaT, "datetime64[ns]", np.datetime64("NaT"), "datetime64[ns]"),
(Timestamp(1), "datetime64[ns]", 1, "datetime64[ns]"),
(Timestamp(1, tz="US/Eastern"), "datetime64[ns]", 1, "datetime64[ns]"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a deeply weird behavior that we should be trying to deprecate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, true, it can be counterintuitive that pd.Timestamp("1970-01-01", tz="US/Pacific") != pd.Timestamp(0, tz="US/Pacific")

[
(Timedelta(1), "timedelta64[ns]"),
(Timestamp(1), "datetime64[ns]"),
(Timestamp(1, tz="US/Eastern"), "datetime64[ns]"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same weird behavior

@jbrockmendel
Copy link
Member

is #38405 doing the same thing as this?

@ma3da
Copy link
Contributor Author

ma3da commented Dec 11, 2020

is #38405 doing the same thing as this?

Yes, I'll close.

@ma3da ma3da closed this Dec 11, 2020
@ma3da ma3da deleted the GH_38032 branch March 22, 2021 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: series construction from dict of Timedelta scalar doesn't work
3 participants