Closed
Description
There is a strange error happening during pandas.merge when there is a unicode column name followed by a numpy.float64 column name. The error only happens for certain numpy.float64 values. The error is: "UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 5: invalid start byte"
The following code reproduces the error:
import pandas as pd
import numpy as np
def my_test(X):
t=pd.DataFrame([[1, 2], [3, 4]])
u=pd.DataFrame([[9, 10], [11, 12]])
t.rename(columns={0:unicode('a'),1:np.float64(X)}, inplace=True)
u.rename(columns={0:unicode('x'),1:unicode('y')}, inplace=True)
pd.merge(u, t, how="inner", left_index=True, right_index=True)
#works fine for 113, but throws an error for 114
my_test(113)
my_test(114)
#print out the numbers up to 200 for which this error occurs:
problem_numbers=[]
for i in range(200):
try:
my_test(i)
except UnicodeDecodeError:
problem_numbers.append(i)
print(problem_numbers)