Description
I ran into a bug in the read_csv importer when trying to read in a file with
a European style decimal encoding (e.g. 8.1 -> 8,2). Setting the decimal-parameter
appropriately should make this easy but in my case pandas refused to accepts any different data type than a simple object.
After a few attempts with various files and snipplets of code I nailed down the problem to the skipfooter parameter. As far as I can judge skipfooter causes the decimal parameter to be ignored. Take the following example:
In [44]:
data = 'a;b;c\n1,1;2,2;3,3\n4;5;6\n7;8;9'
data
Out[44]:
'a;b;c\n1,1;2,2;3,3\n4;5;6\n7;8;9'
In [45]:
df = pd.read_csv(io.StringIO(data), sep=";",decimal=",",dtype=np.float64)
df
Out[45]:
a b c
0 1.1 2.2 3.3
1 4.0 5.0 6.0
2 7.0 8.0 9.0
3 rows × 3 columns
In [46]:
df.dtypes
Out[46]:
a float64
b float64
c float64
dtype: object
Perfect - the behaviour I expected. Now let’s add as single line a an arbitrary footer and ignore this line in the import.
In [47]:
data = data+'\nFooter'
data
Out[47]:
'a;b;c\n1,1;2,2;3,3\n4;5;6\n7;8;9\nFooter'
In [48]:
df = pd.read_csv(io.StringIO(data), sep=";",decimal=",",dtype=np.float64,skipfooter=1)
df
Out[48]:
a b c
0 1,1 2,2 3,3
1 4 5 6
2 7 8 9
3 rows × 3 columns
In [49]:
df.dtypes
Out[49]:
a object
b object
c object
dtype: object
Now all data type information is lost supposingly because the conversion from the comma-separated to the dot-separated values failed. Adding an additional converter to the import (converters={'Rate': lambda x: float(x.replace('.','').replace(',','.'))}) fixes the problem and makes it more likely that the skipfooter routine is faulty.
System: iPython 2.0.0, Python 3.3.5, pandas 0.13.0