Skip to content

ENH: More helpful error messages for merges with incompatible keys #51861

Closed
@Caerbannog

Description

@Caerbannog

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I wish error messages for erroneous joins were more helpful when the key that causes problems is not evident.
As a pandas user it would help me understand errors faster.

When calling pandas.merge() without the on argument, the join is performed on the intersection of the key in left and right.
There can be dozens of such keys. In this situation, if some of the columns have an incompatible type the error message is:

You are trying to merge on object and int64 columns.

This currently does not help finding which of the dozens of columns have incompatible types.

This situation happened a few times when comparing frames with lots of auto-detected columns of which a few columns are mostly None. If one the columns happens to be full of None, its auto-detected type becomes Object, which can't be compared with int64.

Feature Description

Consider the following code:

import pandas as pd
df_left = pd.DataFrame([
  {"a": 1, "b": 1, "c": 1, "d": 1, "z": 1},
])
df_right = pd.DataFrame([
  {"a": 1, "b": 1, "c": 1, "d": None, "z": 1},
])
merge = pd.merge(df_left, df_right, indicator='source')

It fails with the following error:

ValueError: You are trying to merge on int64 and object columns. If you wish to proceed you should use pd.concat

A more helpful error message would be:

ValueError: You are trying to merge on int64 and object columns for key "d". If you wish to proceed you should use pd.concat

Alternative Solutions

It is always possible to find the problematic column by comparing column types manually.
This requires that the data was persisted. If the data was transient there might be no way to find out the problematic column a posteriori.

Additional Context

No response

Metadata

Metadata

Assignees

Labels

EnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions