Closed
Description
A follow-up to #13237 . Copied examples:
Here's what to_numeric shows:
In [137]: pd.to_numeric(o)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
pandas/src/inference.pyx in pandas.lib.maybe_convert_numeric (pandas/lib.c:55708)()
ValueError: Unable to parse string "52721156299871854681072370812488856336599863860274272781560003496046130816295143841376557767523688876482371118681808060318819187029855011862637267061548684520491431327693943042785705302632892888308342787190965977140539558800921069356177102235514666302335984730634641934384020650987601849970936578094137344.00000"
And here's what read_csv
shows (the data is at ftp://ftp.sanger.ac.uk/pub/consortia/ibdgenetics/iibdgc-trans-ancestry-summary-stats.tar):
In [138]: d = pd.read_csv('EUR.UC.gwas.assoc', delim_whitespace=True, usecols=['OR'], dtype={'OR': np.float64})
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
pandas/parser.pyx in pandas.parser.TextReader._convert_tokens (pandas/parser.c:14411)()
TypeError: Cannot cast array from dtype('O') to dtype('float64') according to the rule 'safe'
During handling of the above exception, another exception occurred:
[... long stacktrace ...]
pandas/parser.pyx in pandas.parser.TextReader._convert_tokens (pandas/parser.c:14632)()
ValueError: cannot safely convert passed user dtype of float64 for object dtyped data in column 8
@jreback
I've finally started looking into it, and it seems that I can't implement it in a good way without changing NumPy because, in the end, it's NumPy who doesn't give any row/value information, albeit Pandas conditionally changes the exception to its own.
I can write an ad-hoc implementation for numeric conversion using pd.to_numeric
though, and use its row/value information in case it raises an exception. What do you think?