Skip to content

ENH: read_csv dtype with datetime w/tz & integrations with category #24542

Open
@jreback

Description

@jreback

xref #23228

Datetime w/tz

I think this should work

In [16]: df = pd.DataFrame({'Int': pd.Series([1, 2, 3], dtype='Int64'), 'Date': pd.date_range('20180101', periods=3, tz='US/Eastern')})
    ...: 

In [17]: df.dtypes
Out[17]: 
Int                          Int64
Date    datetime64[ns, US/Eastern]
dtype: object

In [18]: df.to_csv('foo.csv')

In [19]: pd.read_csv('foo.csv',index_col=0,dtype={'Int':'Int64','Date':pd.DatetimeTZDtype('ns', 'US/Eastern')})
TypeError: the dtype datetime64[ns, US/Eastern] is not supported for parsing

This probably requires #24024 first (as need _sequence_of_strings, which is basically a call to .to_datetime() then a dance to convert to the dtype (as the read values may be localized already or not).

Categoricals

should be unified with our current work-around for 'category'.

In [25]: pd.read_csv('foo.csv',index_col=0,dtype={'Int':'Int64','Date':pd.CategoricalDtype})
NotImplementedError: Extension Array: <class 'pandas.core.arrays.categorical.Categorical'> must implement _from_sequence_of_strings in order to be used in parser methods
In [27]: pd.read_csv('foo.csv',index_col=0,dtype={'Int':'Int64','Date':'category'})
Out[27]: 
   Int                       Date
0    1  2018-01-01 00:00:00-05:00
1    2  2018-01-02 00:00:00-05:00
2    3  2018-01-03 00:00:00-05:00

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions