Description
date_format
keyword could take the format, dict of columns to format, or list of formats (and could then obviate the need for parse_dates)
Sometimes months and days get mixed up.
E.g.
test.csv:
a,b
27.03.2003 14:55:00.000,1
03.08.2003 15:20:00.000,2
read_csv("/home/john/Documents/test.csv",index_col=0, parse_dates=True)
b
a
2003-03-27 14:55:00 1
2003-03-08 15:20:00 2
There doesn't appear to be any continuity in the date parsing over the rows. As well as meaning things can easily get switched around, this makes date parsing VERY slow. Once you know the format, using the datetime constructor with string slicing as a parser makes read_csv 20x faster on my machine.
I think there needs to be some more parameters for specifying date formats. seeing as in the general case dates a string of dates can be ambiguous (see above).
A possible approach: Have a few default formats to choose from, as well as a more general format string approach. Obviously the defaults could use the datetime constructor with string slicing, which is very fast.
Perhaps have a dayfirst and yearfirst flag that gets passed to dateutil.parser.parse to solve ambiguities when using automatic parsing.