Skip to content

Feature Request: row-level Merge Status Variable #8790

Closed
@nickeubank

Description

@nickeubank

Hello All!

Moving into Pandas from R/Matlab/Stata. One feature I'm finding I really miss: a field generated during a merge that reports, for each row, whether that row came from the left dataset, the right dataset, or was successfully merged.

I do a lot of work in social science where our data is VERY dirty. For example, I'm often merging transliterated names from Pakistan, and so I just want to see after a merge how many records successfully merged, and then easily pull out the records of each time to compare. I'm including a kludge I'm using below, but an in-line option would be so nice, and I think others in my area would also appreciate it.

Thanks!

    df1['left'] = 1
    df2['right'] = 1

    mergedDF = pd.merge(df1,df2, how='outer', left_on='key1', right_on='key2')

    def mergeVar(x):
        if x['left'] == 1 and x['right'] == 1 :
            return 'both'
        elif x['left'] == 1 and x['right'] != 1:
            return 'leftOnly'
        else: return 'rightOnly'

    mergedDF['mergeReport'] = mergedDF.apply(mergeVar, axis=1)
    mergedDF.drop(['left', 'right'], axis = 1)

(I'd also be happy to help make it, but I'm relatively new to python, so would need some hand-holding to learn how to integrate a new option into an existing function...)

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions