Confusing interpretation of what DataFrame.join does

From the docs I see that the difference between .merge and .join is that .join operates using indexes by default, but it also lets you use columns,  so I tried to use it for that since it sounds natural when coming from the SQL world.

From the [docs](http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.DataFrame.join.html):

> on : column name, tuple/list of column names, or array-like
> Column(s) to use for joining, otherwise join on index. If multiples columns given, the passed DataFrame must have a MultiIndex. Can pass an array as the join key if not already contained in the calling DataFrame. Like an Excel VLOOKUP operation

From my understanding, if on is absent a join operation is performed on the index, if on is present, it would be reasonable to think that the same operation would be performed.

Having said that:

``` python
a = pd.DataFrame({'address' : ["1820 SOME STREET", "32 ANOTHER STREET",
                              "1140 BIG STREET", "20 SMALL STREET",
                              "1090 AVENUE NAME"],
                  'id' : [1,2,3,4,5]
                  })
b = pd.DataFrame({'address' : ["1090 AVENUE NAME",
                              "1140 BIG STREET", "20 SMALL STREET",
                              "1820 SOME STREET", "32 ANOTHER STREET"],
                  'id' : [5,3,4,1,2]})

a.join(b, on='id', lsuffix='_x',  rsuffix='_y', how='left')
```

Output:

```
           address_x  id_x          address_y  id_y
0   1820 SOME STREET     1    1140 BIG STREET     3
1  32 ANOTHER STREET     2    20 SMALL STREET     4
2    1140 BIG STREET     3   1820 SOME STREET     1
3    20 SMALL STREET     4  32 ANOTHER STREET     2
4   1090 AVENUE NAME     5                NaN   NaN
```

And if I add id as index:

``` python
a.set_index('id', inplace=True)
b.set_index('id', inplace=True)
a.join(b, lsuffix='_x',  rsuffix='_y', how='left')
```

Output:

```
            address_x          address_y
id
1    1820 SOME STREET   1820 SOME STREET
2   32 ANOTHER STREET  32 ANOTHER STREET
3     1140 BIG STREET    1140 BIG STREET
4     20 SMALL STREET    20 SMALL STREET
5    1090 AVENUE NAME   1090 AVENUE NAME
```

Is that the correct behavior? If yes, I think the documentation is misleading. It took me a lot to find the bug in my code and I ended up using merge since .join works in an unexpected way.

I don't think I'm the [only one](http://stackoverflow.com/questions/10114399/pandas-simple-join-not-working) with this issue, so maybe a change in the documentation would help to clarify how .join works.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Confusing interpretation of what DataFrame.join does #12188

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Confusing interpretation of what DataFrame.join does #12188

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions