Skip to content

Is v = np.array(v.dt.to_pydatetime()) still necessary? #4836

Closed
@MarcoGorelli

Description

@MarcoGorelli

As far as I can tell, the lines

elif v.dtype.kind == "M":
# Convert datetime Series/Index to numpy array of datetimes
if isinstance(v, pd.Series):
with warnings.catch_warnings():
warnings.simplefilter("ignore", FutureWarning)
# Series.dt.to_pydatetime will return Index[object]
# https://github.com/pandas-dev/pandas/pull/52459
v = np.array(v.dt.to_pydatetime())

were introduced in #1163 to introduce issues with displaying numpy datetime64 arrays

However, have the numpy datetime64 issues since been fixed? From having built Polars from source, then here's what I see on the master branch:

Image

Looks like it displays fine

If I apply the diff

--- a/packages/python/plotly/_plotly_utils/basevalidators.py
+++ b/packages/python/plotly/_plotly_utils/basevalidators.py
@@ -95,20 +95,7 @@ def copy_to_readonly_numpy_array(v, kind=None, force_numeric=False):
 
     # Handle pandas Series and Index objects
     if pd and isinstance(v, (pd.Series, pd.Index)):
-        if v.dtype.kind in numeric_kinds:
-            # Get the numeric numpy array so we use fast path below
-            v = v.values
-        elif v.dtype.kind == "M":
-            # Convert datetime Series/Index to numpy array of datetimes
-            if isinstance(v, pd.Series):
-                with warnings.catch_warnings():
-                    warnings.simplefilter("ignore", FutureWarning)
-                    # Series.dt.to_pydatetime will return Index[object]
-                    # https://github.com/pandas-dev/pandas/pull/52459
-                    v = np.array(v.dt.to_pydatetime())
-            else:
-                # DatetimeIndex
-                v = v.to_pydatetime()
+        v = v.values

then it looks like pandas datetime Series still display fine

Image


Asking in the context of #4790, as copy_to_readonly_numpy_array would need to handle other kinds of inputs (not just pandas series / index)

A plain conversion to numpy would be a lot faster than going via stdlib datetime objects

In [23]: %time np.array(s.dt.to_pydatetime())
CPU times: user 325 ms, sys: 8.34 ms, total: 333 ms
Wall time: 360 ms
Out[23]:
array([datetime.datetime(2000, 1, 1, 0, 0),
       datetime.datetime(2000, 1, 1, 1, 0),
       datetime.datetime(2000, 1, 1, 2, 0), ...,
       datetime.datetime(2114, 1, 29, 13, 0),
       datetime.datetime(2114, 1, 29, 14, 0),
       datetime.datetime(2114, 1, 29, 15, 0)], dtype=object)

In [24]: %time s.to_numpy()
CPU times: user 46 μs, sys: 0 ns, total: 46 μs
Wall time: 51.5 μs
Out[24]:
array(['2000-01-01T00:00:00.000000000', '2000-01-01T01:00:00.000000000',
       '2000-01-01T02:00:00.000000000', ...,
       '2114-01-29T13:00:00.000000000', '2114-01-29T14:00:00.000000000',
       '2114-01-29T15:00:00.000000000'], dtype='datetime64[ns]')

Metadata

Metadata

Assignees

Labels

P2considered for next cyclefeaturesomething new

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions