BUG: add date_format to read_csv / Date parsing mistake. read_csv

`date_format` keyword could take the format, dict of columns to format, or list of formats (and could then obviate the need for parse_dates)

Sometimes months and days get mixed up.

E.g.
test.csv:

a,b
27.03.2003 14:55:00.000,1
03.08.2003 15:20:00.000,2

read_csv("/home/john/Documents/test.csv",index_col=0, parse_dates=True)
                     b
a  
2003-03-27 14:55:00  1
2003-03-08 15:20:00  2

There doesn't appear to be any continuity in the date parsing over the rows. As well as meaning things can easily get switched around, this makes date parsing VERY slow. Once you know the format, using the datetime constructor with string slicing as a parser makes read_csv 20x faster on my machine.

I think there needs to be some more parameters for specifying date formats. seeing as in the general case dates a string of dates can be ambiguous (see above).

A possible approach: Have a few default formats to choose from, as well as a more general format string approach. Obviously the defaults could use the datetime constructor with string slicing, which is very fast.

Perhaps have a dayfirst and yearfirst flag that gets passed to dateutil.parser.parse to solve ambiguities when using automatic parsing.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: add date_format to read_csv / Date parsing mistake. read_csv #2586

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

BUG: add date_format to read_csv / Date parsing mistake. read_csv #2586

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions