Skip to content

error merging dataframes with unicode and numpy.float64 column names #13353

Closed
@corakingdon

Description

@corakingdon

There is a strange error happening during pandas.merge when there is a unicode column name followed by a numpy.float64 column name. The error only happens for certain numpy.float64 values. The error is: "UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 5: invalid start byte"

The following code reproduces the error:

import pandas as pd
import numpy as np

def my_test(X):

    t=pd.DataFrame([[1, 2], [3, 4]])
    u=pd.DataFrame([[9, 10], [11, 12]])

    t.rename(columns={0:unicode('a'),1:np.float64(X)}, inplace=True)
    u.rename(columns={0:unicode('x'),1:unicode('y')}, inplace=True)

    pd.merge(u, t, how="inner", left_index=True, right_index=True)

#works fine for 113, but throws an error for 114
my_test(113)
my_test(114)

#print out the numbers up to 200 for which this error occurs:
problem_numbers=[]
for i in range(200):
    try:
        my_test(i)
    except UnicodeDecodeError:
        problem_numbers.append(i)

print(problem_numbers)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions