Skip to content

read_table/csv unexpected type dependence on delimiter #2601

Closed
@dsm054

Description

@dsm054

While answering a question on SO I came across something which puzzled me:

>>> import pandas as pd
>>> pd.__version__
'0.10.0b1'
>>> repr(open('cusip.txt').read())
"'65248E10 11\\n55555E55 22\\n'"
>>> !cat cusip.txt
65248E10 11
55555E55 22
>>> df = pd.read_table("cusip.txt", header=None, sep=" ")
>>> df
              0   1
0  6.524800e+14  11
1  5.555500e+59  22
>>> type(df[0][0])
<type 'numpy.float64'>
>>> df = pd.read_table("cusip.txt", header=None, sep=r"\s+")
>>> df
                     0   1
0      652480000000000  11
1 -9223372036854775808  22
>>> type(df[0][0])
<type 'numpy.int64'>

Changing the delimiter from " " to r"\s+" somehow triggered the interpretation of the first column as integers instead of floats.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions