BUG: Series constructor drops nanoseconds of Timedelta scalar #38040

ma3da · 2020-11-24T16:22:30Z

closes BUG: series construction from dict of Timedelta scalar doesn't work #38032
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

(#GH38032) use convert_scalar_for_putitemlike for better conversion

add test

reworking

use cast_scalar_to_array as base for ndarray creation

Take care of non np.dtype args

cast_scalar_to_array: more testing + cast shape to tuple where needed

ma3da · 2020-11-26T06:28:15Z

Well, actually, do this changes in cast.py make sense?

jreback

can you add a specific test that hits the OP: #38032, also needs a whatsnew note (1.2) if you can do soonish.

jreback · 2020-11-26T17:28:07Z

pandas/core/dtypes/cast.py

+                value = ensure_str(value)
+        elif dtype.kind in ["M", "m"]:
+            # GH38032: filling in Timedelta/Timestamp drops nanoseconds
+            if isinstance(value, (Timedelta, Timestamp)):


this is the change yes?

this will be lossy for tzaware

Thank you both for your comments. Yes this is the change.

At first I went to update construct_1d_arraylike_from_scalar, but tests showed more was broken.

cast_scalar_to_array is used in DF constructor and but did not check its dtype arg like construct_1d_arraylike_from_scalar, hence e.g.

In [2]: pd.DataFrame("hello", index=[0], columns=[0], dtype="U") Out[2]: 0 0 h

I wanted to isolate the part taking care of converting a given (scalar, np.dtype) into smth suitable for creating a ndarray filled with that scalar, and cast_scalar_to_array felt like a good place.

jreback · 2020-11-26T17:29:02Z

pandas/tests/dtypes/cast/test_infer_dtype.py

+def test_cast_scalar_to_array_conversion_needed(
+    obj_in, dtype_in, obj_out, dtype_out, shape
+):
+    tm.assert_numpy_array_equal(


can you write this as

result=
expected=
tm.assert_....

jreback · 2020-11-26T17:29:35Z

pandas/tests/dtypes/cast/test_infer_dtype.py

+    "obj_in,dtype_in,obj_out,dtype_out",
+    [
+        (NaT, "datetime64[ns]", np.datetime64("NaT"), "datetime64[ns]"),
+        (Timestamp(1), "datetime64[ns]", 1, "datetime64[ns]"),


can you add a tz-aware Timestamp case
can you add Period, Interval as well

you can probably use this fixture: ea_scalar_and_dtype

to be sure: as we cast to np.dtype, Period and Interval should be left aside, right?

jreback · 2020-11-26T17:30:38Z

pandas/tests/dtypes/cast/test_infer_dtype.py

+@pytest.mark.parametrize(
+    "obj_in,dtype_in,obj_out,dtype_out",
+    [
+        (NaT, "datetime64[ns]", np.datetime64("NaT"), "datetime64[ns]"),


pls add a Nat that is timedelta64[ns]

jreback · 2020-11-26T17:33:35Z

cc @jbrockmendel

jbrockmendel · 2020-11-26T22:39:31Z

pandas/core/dtypes/cast.py

    else:
-        fill_value = value
+        if not isinstance(dtype, np.dtype):
+            dtype = np.dtype(dtype)


is there no risk of getting here with an EADtype?

As it is, no, cast_scalar_to_array is called after having checked dtype is not EADtype. It must have been written with that in mind.

can you add an assert isinstance(dtype, (type(None), str, np.dtype)) and a comment (that's what the type annotation indicates) (at the top of the function)

jbrockmendel · 2020-11-26T22:39:52Z

pandas/core/dtypes/cast.py

+        if not isinstance(dtype, np.dtype):
+            dtype = np.dtype(dtype)
+        empty = shape and not any(shape)
+        # rem: type coercion if empty: sometimes yes, sometimes no ?


what is "rem"?

oh, just remark

but yes, sorry, this is a real question: in the case of empty DF/Series init, is this non-uniformity in dtype casting wanted ?

use TODO (or don't use rem its not common nomeclature)

pandas/core/dtypes/cast.py

CR

jreback · 2020-11-28T17:40:41Z

pandas/core/dtypes/cast.py

    else:
-        fill_value = value
+        if not isinstance(dtype, np.dtype):
+            dtype = np.dtype(dtype)


can you add an assert isinstance(dtype, (type(None), str, np.dtype)) and a comment (that's what the type annotation indicates) (at the top of the function)

jreback · 2020-11-28T17:41:03Z

pandas/core/dtypes/cast.py

+        if not isinstance(dtype, np.dtype):
+            dtype = np.dtype(dtype)
+        empty = shape and not any(shape)
+        # rem: type coercion if empty: sometimes yes, sometimes no ?


use TODO (or don't use rem its not common nomeclature)

pandas/tests/frame/test_constructors.py

jbrockmendel · 2020-12-11T04:59:39Z

pandas/core/dtypes/cast.py

+            dtype = np.dtype("object")
+            if not isna(value):
+                value = ensure_str(value)
+        elif dtype.kind == "m":


could share the "m" and "M" code and for the is_valid_nat_for_dtype case use value = dtype.type("NaT", "ns")

jbrockmendel · 2020-12-11T05:01:34Z

pandas/tests/dtypes/cast/test_infer_dtype.py

+        (Timedelta(1), "timedelta64[ns]", 1, "timedelta64[ns]"),
+        (NaT, "datetime64[ns]", np.datetime64("NaT"), "datetime64[ns]"),
+        (Timestamp(1), "datetime64[ns]", 1, "datetime64[ns]"),
+        (Timestamp(1, tz="US/Eastern"), "datetime64[ns]", 1, "datetime64[ns]"),


this is a deeply weird behavior that we should be trying to deprecate

Ok, true, it can be counterintuitive that pd.Timestamp("1970-01-01", tz="US/Pacific") != pd.Timestamp(0, tz="US/Pacific")

jbrockmendel · 2020-12-11T05:01:54Z

pandas/tests/frame/test_constructors.py

+        [
+            (Timedelta(1), "timedelta64[ns]"),
+            (Timestamp(1), "datetime64[ns]"),
+            (Timestamp(1, tz="US/Eastern"), "datetime64[ns]"),


same weird behavior

jbrockmendel · 2020-12-11T05:02:41Z

is #38405 doing the same thing as this?

ma3da · 2020-12-11T08:49:23Z

is #38405 doing the same thing as this?

Yes, I'll close.

ma3da added 4 commits November 25, 2020 14:05

BUG: series from dict of Timedelta scalar drops nanoseconds

2a334c9

(#GH38032) use convert_scalar_for_putitemlike for better conversion

BUG: series from dict of Timedelta scalar drops nanoseconds

d5f67b2

add test

BUG: series from dict of Timedelta scalar drops nanoseconds

0e87f77

reworking

BUG: Series/DF constructors may drop Timedelta/Timestamp nanoseconds

774a130

use cast_scalar_to_array as base for ndarray creation

ma3da force-pushed the GH_38032 branch from 76b63b5 to 774a130 Compare November 25, 2020 19:08

ma3da added 3 commits November 25, 2020 21:45

BUG: Series/DF constructors may drop Timedelta/Timestamp nanoseconds

b5054c9

Take care of non np.dtype args

BUG: Series/DF constructors may drop Timedelta/Timestamp nanoseconds

b1fbdb7

cast_scalar_to_array: more testing + cast shape to tuple where needed

Merge remote-tracking branch 'upstream/master' into GH_38032

9c66d98

ma3da marked this pull request as ready for review November 26, 2020 06:24

jreback reviewed Nov 26, 2020

View reviewed changes

jreback added Bug Timedelta Timedelta data type Constructors Series/DataFrame/Index/pd.array Constructors labels Nov 26, 2020

jbrockmendel reviewed Nov 26, 2020

View reviewed changes

pandas/core/dtypes/cast.py Show resolved Hide resolved

ma3da added 2 commits November 27, 2020 00:36

BUG: Series/DF constructors may drop Timedelta/Timestamp nanoseconds

282041d

CR

CR -- WIP

90202a0

jreback requested changes Nov 28, 2020

View reviewed changes

ma3da added 4 commits November 29, 2020 23:10

cr: add test for DF constructor

41c75cf

cr: assert dtype in _cast_scalar_to_array

940bc2c

Merge remote-tracking branch 'upstream/master' into GH_38032

6d36911

whatsnew entry

96e1876

ma3da commented Nov 29, 2020

View reviewed changes

pandas/tests/frame/test_constructors.py Outdated Show resolved Hide resolved

ma3da added 3 commits November 30, 2020 00:40

correcting constructor tests

8ae4dfe

prevent Timestamp <-> Timedelta coercion + missed fix on tests

85a4016

Merge remote-tracking branch 'upstream/master' into GH_38032

4e0a787

jbrockmendel reviewed Dec 11, 2020

View reviewed changes

ma3da closed this Dec 11, 2020

jbrockmendel mentioned this pull request Dec 11, 2020

BUG: Series/DataFrame construction from scalars #38405

Merged

5 tasks

ma3da deleted the GH_38032 branch March 22, 2021 20:24

Uh oh!

BUG: Series constructor drops nanoseconds of Timedelta scalar #38040

BUG: Series constructor drops nanoseconds of Timedelta scalar #38040

Uh oh!

Conversation

ma3da commented Nov 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ma3da commented Nov 26, 2020

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Nov 26, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Dec 11, 2020

Uh oh!

ma3da commented Dec 11, 2020

Uh oh!

Uh oh!

ma3da commented Nov 24, 2020 •

edited

Loading