Skip to content

Right/outer merge behaviour on left column and right index is unexpected #17257

Open
@ianepreston

Description

@ianepreston

Code Sample, a copy-pastable example if possible

import pandas as pd
big_index = [123, 124, 125, 126, 127, 128, 129, 130]
big_dat = {'year': pd.Series([2000, 2000, 2000, 2001, 2002, 2002, 2002, 2004], index=big_index),
          'other': pd.Series(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'], index=big_index)}
big_df = pd.DataFrame(big_dat)

year_index = [2003, 2000, 2001, 2002]
year_dat = {'a': pd.Series([1, 2, 3, 4], index=year_index),
            'b': pd.Series([5, 6, 7, 8], index=year_index)}
year_df = pd.DataFrame(year_dat)

merged_right = pd.merge(
        big_df,
        year_df,
        how='right',
        left_on='year',
        right_index=True
        )
merged_outer = pd.merge(
        big_df,
        year_df,
        how='outer',
        left_on='year',
        right_index=True
        )
merged_right
Out[5]: 
    other  year  a  b
123     a  2000  2  6
124     b  2000  2  6
125     c  2000  2  6
126     d  2001  3  7
127     e  2002  4  8
128     f  2002  4  8
129     g  2002  4  8
130   NaN  2003  1  5

merged_outer
Out[6]: 
    other  year    a    b
123     a  2000  2.0  6.0
124     b  2000  2.0  6.0
125     c  2000  2.0  6.0
126     d  2001  3.0  7.0
127     e  2002  4.0  8.0
128     f  2002  4.0  8.0
129     g  2002  4.0  8.0
130     h  2004  NaN  NaN
130   NaN  2003  1.0  5.0

Problem description

merged outer and merged right both return year 2003, a 1, b 5 associated with index 130.

Expected Output

Either a NaN index entry or an error/warning. There's no obvious reason to associate that data with 130 index

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None

pandas: 0.20.2
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions