Skip to content

Assignment of column via .loc for numpy non-ns datetimes #27928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Sep 13, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion doc/source/whatsnew/v0.25.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,6 @@ Indexing
-
-
-
-

Missing
^^^^^^^
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,7 @@ Indexing

- Bug in assignment using a reverse slicer (:issue:`26939`)
- Bug in reindexing a :meth:`PeriodIndex` with another type of index that contained a `Period` (:issue:`28323`) (:issue:`28337`)
- Fix assignment of column via `.loc` with numpy non-ns datetime type (:issue:`27395`)

Missing
^^^^^^^
Expand Down
10 changes: 8 additions & 2 deletions pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -1029,7 +1029,10 @@ def maybe_cast_to_datetime(value, dtype, errors="raise"):
)

if is_datetime64 and not is_dtype_equal(dtype, _NS_DTYPE):
if dtype.name in ("datetime64", "datetime64[ns]"):

# pandas supports dtype whose granularity is less than [ns]
# e.g., [ps], [fs], [as]
if dtype <= np.dtype("M8[ns]"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does <= do for numpy dtypes? Check if it's a subtype?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear to me either:

>>> "timedelta64[ns]" < np.dtype("m8[ns]")
False
>>> "timedelta64[us]" < np.dtype("m8[ns]")
True
>>> "timedelta64[ps]" > np.dtype("m8[ns]")
True
>>> "timedelta64" < np.dtype("m8[ns]")
True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger @jbrockmendel
It checks whether the dtype timespan is longer than [ns] or shorter. (FYI, https://docs.scipy.org/doc/numpy/reference/arrays.datetime.html#datetime-units)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment to this effect (in both places)

if dtype.name == "datetime64":
raise ValueError(msg.format(dtype=dtype.name))
dtype = _NS_DTYPE
Expand All @@ -1047,7 +1050,10 @@ def maybe_cast_to_datetime(value, dtype, errors="raise"):
value = [value]

elif is_timedelta64 and not is_dtype_equal(dtype, _TD_DTYPE):
if dtype.name in ("timedelta64", "timedelta64[ns]"):

# pandas supports dtype whose granularity is less than [ns]
# e.g., [ps], [fs], [as]
if dtype <= np.dtype("m8[ns]"):
if dtype.name == "timedelta64":
raise ValueError(msg.format(dtype=dtype.name))
dtype = _TD_DTYPE
Expand Down
22 changes: 22 additions & 0 deletions pandas/tests/indexing/test_loc.py
Original file line number Diff line number Diff line change
Expand Up @@ -690,6 +690,28 @@ def test_loc_setitem_consistency_slice_column_len(self):
)
tm.assert_series_equal(df[("Respondent", "Duration")], expected)

@pytest.mark.parametrize("unit", ["Y", "M", "D", "h", "m", "s", "ms", "us"])
def test_loc_assign_non_ns_datetime(self, unit):
# GH 27395, non-ns dtype assignment via .loc should work
# and return the same result when using simple assignment
df = DataFrame(
{
"timestamp": [
np.datetime64("2017-02-11 12:41:29"),
np.datetime64("1991-11-07 04:22:37"),
]
}
)

df.loc[:, unit] = df.loc[:, "timestamp"].values.astype(
"datetime64[{unit}]".format(unit=unit)
)
df["expected"] = df.loc[:, "timestamp"].values.astype(
"datetime64[{unit}]".format(unit=unit)
)
expected = Series(df.loc[:, "expected"], name=unit)
tm.assert_series_equal(df.loc[:, unit], expected)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an assert about the result here? tm.assert_series_equal(df['day'], expected)

Copy link
Contributor Author

@inmoonlight inmoonlight Aug 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I update the tests, I found a strange result. (Not sure this is intended.)

df = pd.DataFrame({'timestamp': [np.datetime64('2017-01-01 01:02:23'), np.datetime64('2018-11-12 12:34:12')]})
df['year'] = df.loc[:, 'timestamp'].values.astype('datetime64[Y]')
df['month'] = df.loc[:, 'timestamp'].values.astype('datetime64[M]')
df.loc[:, 'day'] = df.loc[:, 'timestamp'].values.astype('datetime64[D]')

result:

            timestamp        day      month       year
0 2017-01-01 01:02:23 2017-01-01 2017-01-01 2017-01-01
1 2018-11-12 12:34:12 2018-11-12 2018-11-01 2018-01-01

astype result is okay.

df.loc[:, 'timestamp'].values.astype('datetime64[Y]')

result:

array(['2017', '2018'], dtype='datetime64[Y]')

If this is intended, then I'll make an assert with this result and remove previous test (pandas/tests/dtypes/cast/test_infer_dtype.py)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without running the test in master, it isn't obvious to the reader what is being tested. I suggest something like:

@pytest.mark.parametrize("unit", ...)
def test_loc_setitem_datetime64_non_ns_converts(self, unit):
     # GH#27928 a few words about how this used to break
    df = DataFrame(
            {
                "timestamp": [
                    np.datetime64("2017-02-11 12:41:29"),
                    np.datetime64("1991-11-07 04:22:37"),
                ]
            }
        )
        df[unit] = df["timestamp"].values.astype("datetime64[{unit}]".format(unit=unit))
        df.loc[:, unit] = df["timestamp"].values.astype("datetime64[{unit}]".format(unit=unit))

def test_loc_setitem_frame(self):
df = self.frame_labels

Expand Down