Skip to content

BUG: Merge error when merge col is int64 and Int64 #46178

Closed
@tritemio

Description

@tritemio

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

# First DF
df1 = pd.DataFrame({'col1': [1,1,2,2,3,pd.NA]})
df1['col1'] = df1['col1'].astype('Int64')

# Second DF
df2 = pd.DataFrame({'col1': [1,2,3], 'col2': list('abc')}).set_index('col1')

print(df1.dtypes)
print(df2.dtypes)
print(df1.merge(df2, left_on='col1', right_index=True))

Issue Description

Merging two DataFrames on a column (col1) that is Int64 in the first and int64 in the second DF, causes the following confusing error message:

Traceback (most recent call last):
  File "/Users/anto/xxx/src/bugs/pandas_join.py", line 30, in <module>
    print(df1.merge(df2, on='col1', how='left'))
  File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/frame.py", line 9339, in merge
    return merge(
  File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 122, in merge
    return op.get_result()
  File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 716, in get_result
    join_index, left_indexer, right_indexer = self._get_join_info()
  File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 967, in _get_join_info
    (left_indexer, right_indexer) = self._get_join_indexers()
  File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 941, in _get_join_indexers
    return get_join_indexers(
  File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 1484, in get_join_indexers
    zipped = zip(*mapped)
  File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 1481, in <genexpr>
    _factorize_keys(left_keys[n], right_keys[n], sort=sort, how=how)
  File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 2164, in _factorize_keys
    lk = ensure_int64(np.asarray(lk))
  File "pandas/_libs/algos_common_helper.pxi", line 81, in pandas._libs.algos.ensure_int64
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NAType'

Expected Behavior

If the types of the "merge columns" are compatible, the merge should complete with no errors. Or, alternatively, the error message should warn the user of a type mismatch and of the need of casting one of the columns.

For example, casting the col1 in df2 from int64 to Int64 will result in the correct merge:

import pandas as pd

# First DF
df1 = pd.DataFrame({'col1': [1,1,2,2,3,pd.NA]})
df1['col1'] = df1['col1'].astype('Int64')

# Second DF
df2 = pd.DataFrame({'col1': [1,2,3], 'col2': list('abc')}).set_index('col1')

# These two lines are required for the merge to work
df2 = df2.reset_index()
df2['col1'] = df2['col1'].astype('Int64')


print(df1.dtypes)
print(df2.dtypes)
print(df1.merge(df2, on='col1', how='left'))

However, this workaround looks a bit too convoluted for me.

Installed Versions

1.4.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNA - MaskedArraysRelated to pd.NA and nullable extension arraysReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions