Closed
Description
import numpy as np
import os
import pandas as pd
# put test.csv in same folder as script
mydir = os.path.dirname(os.path.abspath(__file__))
csv_path = os.path.join(mydir, "test.csv")
df = pd.read_table(csv_path, sep=' ',
comment='#',
header=None,
skip_blank_lines=True,
names=["A", "B", "C", "D", "E", "F", "G"],
dtype={"A": np.int32,
"B": np.int32,
"C": np.float64,
"D": np.float64,
"E": np.float64,
"F": np.float64,
"G": np.int32})
test.csv
:
2270433 3 21322.889 11924.667 5228.753 1.0 -1
2270432 3 21322.297 11924.667 5228.605 1.0 2270433
Problem description
Attempting to load test.csv with pd.read_table() results in the following errors:
TypeError: Cannot cast array from dtype('float64') to dtype('int32') according to the rule 'safe'
and
ValueError: cannot safely convert passed user dtype of int32 for float64 dtyped data in column 2
Expected behavior:
Either trailing whitespace is ignored by Pandas, or throw a more informative error than "cannot safely convert passed user dtype of int32 for float64". It took me a really long time to figure out that this was caused by trailing spaces in the csv.