Skip to content

Improved documentation for DataFrame.join #12193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

edublancas
Copy link
Contributor

closes #12188

I modified the description in DataFrame.join to make clear the difference with DataFrame.merge, also added examples.

@jreback jreback added Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 1, 2016
* left: use calling frame's index or column(s)
* right: use other frame's index
* outer: form union of calling frame's index or column(s) with
other frame's index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more confusing, as this depends if on is specified or not (so you can simply say that)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this?

* left: use calling frame's index (or column if on is specified)
* right: use other frame's index
* outer: form union of calling frame's index (or column if on is specified) with other frame's index

@jreback jreback added this to the 0.18.0 milestone Feb 1, 2016
@jreback
Copy link
Contributor

jreback commented Feb 9, 2016

can you update

@jreback jreback modified the milestones: 0.18.1, 0.18.0 Feb 11, 2016
DOC: improves DataFrame.join documentation
@edublancas
Copy link
Contributor Author

Done, sorry for the delay.


Perform a left join using caller's key column and other frame's index

>>> caller.join(other.set_index('key'), on='key', how='left',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add this same example w/o using .set_index as well. (and w/o on), and indicate the difference between them.

Copy link
Contributor Author

@edublancas edublancas Apr 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add this example? caller.join(other, how='left', lsuffix='_l', rsuffix='_r')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I don't think it is possible to not use set_index, as join always uses the index of other (which is actually really confusing ...)

@edublancas the lsuffix='_l', rsuffix='_r' is redundant in this case, so I would leave it out

Copy link
Contributor Author

@edublancas edublancas Apr 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about having just these two examples:

Perform a left join using caller's key column and other frame's index

caller.join(other.set_index('key'), on='key', how='left')

Set key as the index column on caller and other, then perform an index-on-index join.

caller.set_index('key').join(other.set_index('key'), how='left')

@jreback
Copy link
Contributor

jreback commented Mar 12, 2016

pls rebase/update

@edublancas
Copy link
Contributor Author

Sorry for the delay, I've been working to meet some deadlines for a project. I'll update in the next few days.

@jreback
Copy link
Contributor

jreback commented Apr 18, 2016

can you rebase/update

@jreback jreback removed this from the 0.18.1 milestone Apr 18, 2016
index-on-index and index-on-column(s) joins, but *joins on indexes* by default
rather than trying to join on common columns (the default behavior for
``merge``). If you are joining on index, you may wish to use ``DataFrame.join``
to save yourself some typing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you shorten this?
I think the " joins on indexes by default" is very useful explanation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the shorter explanation is better:

index-on-index (by default) and column(s)-on-index join. If you are joining on index only, you may wish to use DataFrame.join to save yourself some typing.

@jreback
Copy link
Contributor

jreback commented May 7, 2016

can you update according to comments

@jreback
Copy link
Contributor

jreback commented May 25, 2016

can you rebase / update?

@edublancas
Copy link
Contributor Author

edublancas commented May 25, 2016

I think the documentation is clear now. There are 3 examples, one using the dataframes original indexes and two joining using the key columns, the first one setting key as the index in both and the second one using on.

@jreback jreback closed this in 57ea76f May 26, 2016
@jreback jreback added this to the 0.18.2 milestone May 26, 2016
@jreback
Copy link
Contributor

jreback commented May 26, 2016

thanks @edublancas nice improvement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Confusing interpretation of what DataFrame.join does
3 participants