Skip to content

BUG: add date_format to read_csv / Date parsing mistake. read_csv #2586

Closed
@John-Colvin

Description

@John-Colvin

date_format keyword could take the format, dict of columns to format, or list of formats (and could then obviate the need for parse_dates)

Sometimes months and days get mixed up.

E.g.
test.csv:

a,b
27.03.2003 14:55:00.000,1
03.08.2003 15:20:00.000,2

read_csv("/home/john/Documents/test.csv",index_col=0, parse_dates=True)
b
a
2003-03-27 14:55:00 1
2003-03-08 15:20:00 2

There doesn't appear to be any continuity in the date parsing over the rows. As well as meaning things can easily get switched around, this makes date parsing VERY slow. Once you know the format, using the datetime constructor with string slicing as a parser makes read_csv 20x faster on my machine.

I think there needs to be some more parameters for specifying date formats. seeing as in the general case dates a string of dates can be ambiguous (see above).

A possible approach: Have a few default formats to choose from, as well as a more general format string approach. Obviously the defaults could use the datetime constructor with string slicing, which is very fast.

Perhaps have a dayfirst and yearfirst flag that gets passed to dateutil.parser.parse to solve ambiguities when using automatic parsing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDatetimeDatetime data dtypeIO CSVread_csv, to_csv

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions