Skip to content

pd.read_table: Using space as delimiter on file with trailing space gives cryptic error #21768

Closed
@crypdick

Description

@crypdick
import numpy as np
import os
import pandas as pd

# put test.csv in same folder as script
mydir = os.path.dirname(os.path.abspath(__file__))
csv_path = os.path.join(mydir, "test.csv")

df = pd.read_table(csv_path, sep=' ',
                   comment='#',
                   header=None,
                   skip_blank_lines=True,
                   names=["A", "B", "C", "D", "E", "F", "G"],
                   dtype={"A": np.int32,
                       "B": np.int32,
                       "C": np.float64,
                       "D": np.float64,
                       "E": np.float64,
                       "F": np.float64,
                       "G": np.int32})

test.csv:

2270433 3 21322.889 11924.667 5228.753 1.0 -1 
2270432 3 21322.297 11924.667 5228.605 1.0 2270433 

Problem description

Attempting to load test.csv with pd.read_table() results in the following errors:
TypeError: Cannot cast array from dtype('float64') to dtype('int32') according to the rule 'safe'

and

ValueError: cannot safely convert passed user dtype of int32 for float64 dtyped data in column 2

Expected behavior:

Either trailing whitespace is ignored by Pandas, or throw a more informative error than "cannot safely convert passed user dtype of int32 for float64". It took me a really long time to figure out that this was caused by trailing spaces in the csv.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Error ReportingIncorrect or improved errors from pandasIO CSVread_csv, to_csv

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions