Description
Code Sample, a copy-pastable example if possible
import pandas as pd
big_index = [123, 124, 125, 126, 127, 128, 129, 130]
big_dat = {'year': pd.Series([2000, 2000, 2000, 2001, 2002, 2002, 2002, 2004], index=big_index),
'other': pd.Series(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'], index=big_index)}
big_df = pd.DataFrame(big_dat)
year_index = [2003, 2000, 2001, 2002]
year_dat = {'a': pd.Series([1, 2, 3, 4], index=year_index),
'b': pd.Series([5, 6, 7, 8], index=year_index)}
year_df = pd.DataFrame(year_dat)
merged_right = pd.merge(
big_df,
year_df,
how='right',
left_on='year',
right_index=True
)
merged_outer = pd.merge(
big_df,
year_df,
how='outer',
left_on='year',
right_index=True
)
merged_right
Out[5]:
other year a b
123 a 2000 2 6
124 b 2000 2 6
125 c 2000 2 6
126 d 2001 3 7
127 e 2002 4 8
128 f 2002 4 8
129 g 2002 4 8
130 NaN 2003 1 5
merged_outer
Out[6]:
other year a b
123 a 2000 2.0 6.0
124 b 2000 2.0 6.0
125 c 2000 2.0 6.0
126 d 2001 3.0 7.0
127 e 2002 4.0 8.0
128 f 2002 4.0 8.0
129 g 2002 4.0 8.0
130 h 2004 NaN NaN
130 NaN 2003 1.0 5.0
Problem description
merged outer and merged right both return year 2003, a 1, b 5 associated with index 130.
Expected Output
Either a NaN index entry or an error/warning. There's no obvious reason to associate that data with 130 index
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None
pandas: 0.20.2
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None