Skip to content

pandas.io.gbq.read_gbq() returns incorrect results #5840

Closed
@markdregan

Description

@markdregan

When using the read_gbq() function on a BigQuery table, incorrect results are returned.

I compare the output from read_gbq() to that of a CSV export from BigQuery directly. Interestingly, there are the same number of rows in each output - however, there are many duplicates in the read_gbq() output.

I'm using Pandas '0.13.0rc1-125-g4952858' on a Mac 10.9 using Python 2.7. Numpy '1.8.0'.

The code I execute to load the data in pandas:
churn_data = gbq.read_gbq(train_query, project_id = projectid)

I can't share the underlying data. What additional data/info would be useful for root causing?

The output data is ~400k lines.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions