Skip to content

read_csv: Infers different column types in different runs #13604

Closed
@aptiko

Description

@aptiko
#!/usr/bin/env python3

from io import StringIO

import pandas as pd

test_timeseries = """\
2008-02-07 09:40,1032.43
2008-02-07 09:50,1042.54
2008-02-07 10:00,1051.65
"""

df = pd.read_csv(StringIO(test_timeseries), parse_dates=[0],
                 usecols=['date', 'value'], index_col=0, header=None,
                 names=('date', 'value'))
print (df.value.dtype)

I run this program 10 times and the result is sometimes float64 and sometimes object.

This happens with pandas 0.18.1 on Debian Jessie amd64 with Python 3.4.2 and numpy 1.11.1. I don't see it happening with Debian's packaged pandas 0.14.1.

I can work around this by specifying the dtype argument; but shouldn't pandas behave deterministically when it's omitted?

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDtype ConversionsUnexpected or buggy dtype conversionsIO CSVread_csv, to_csvTestingpandas testing functions or related to the test suite

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions