-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: update the DataFrame.combine and DataFrame.combine_first docstrings #20237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
414c8d8
de10138
2f9b030
c2f4b56
596535e
ba7af38
f3b8051
29833a3
1618a36
5cc5856
1c7aff9
fbc3207
e969451
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4049,33 +4049,96 @@ def _compare(a, b): | |
|
||
def combine(self, other, func, fill_value=None, overwrite=True): | ||
""" | ||
Add two DataFrame objects and do not propagate NaN values, so if for a | ||
(column, time) one frame is missing a value, it will default to the | ||
other frame's value (which might be NaN as well) | ||
Perform series-wise combine with `other` DataFrame using given `func`. | ||
|
||
Combines `self` DataFrame with `other` DataFrame using `func` | ||
to merge columns. The row and column indexes of the resulting | ||
DataFrame will be the union of the two. If `fill_value` is | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you are describing the parameters here (and they are described below). I think your first 2 sentences are pretty good. |
||
specified, that value will be filled prior to the call to | ||
`func`. If `overwrite` is `False`, columns in `self` that | ||
do not exist in `other` will be preserved. | ||
|
||
Parameters | ||
---------- | ||
other : DataFrame | ||
The DataFrame to merge column-wise. | ||
func : function | ||
Function that takes two series as inputs and return a Series or a | ||
scalar | ||
fill_value : scalar value | ||
scalar, used to merge the two dataframes column by columns. | ||
fill_value : scalar value, default None | ||
The value to fill NaNs with prior to passing any column to the | ||
merge func. | ||
overwrite : boolean, default True | ||
If True then overwrite values for common keys in the calling frame | ||
If True, columns in `self` that do not exist in `other` will be | ||
overwritten with NaNs. | ||
|
||
Returns | ||
------- | ||
result : DataFrame | ||
|
||
Examples | ||
-------- | ||
Combine using a simple function that chooses the smaller column. | ||
>>> from pandas import DataFrame | ||
>>> df1 = DataFrame({'A': [0, 0], 'B': [4, 4]}) | ||
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]}) | ||
>>> df1.combine(df2, lambda s1, s2: s1 if s1.sum() < s2.sum() else s2) | ||
>>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would also show something like:
|
||
>>> df1.combine(df2, take_smaller) | ||
A B | ||
0 0 3 | ||
1 0 3 | ||
|
||
Using `fill_value` fills Nones prior to passing the column to the | ||
merge function. | ||
|
||
>>> df1 = DataFrame({'A': [0, 0], 'B': [None, 4]}) | ||
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]}) | ||
>>> df1.combine(df2, take_smaller, fill_value=-5) | ||
A B | ||
0 0 -5.0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. these should be aligned
|
||
1 0 4.0 | ||
|
||
However, if the same element in both dataframes is None, that None | ||
is preserved | ||
|
||
>>> df1 = DataFrame({'A': [0, 0], 'B': [None, 4]}) | ||
>>> df2 = DataFrame({'A': [1, 1], 'B': [None, 3]}) | ||
>>> df1.combine(df2, take_smaller, fill_value=-5) | ||
A B | ||
0 0 NaN | ||
1 0 3.0 | ||
|
||
Example that demonstrates the use of `overwrite` and behavior when | ||
the axis differ between the dataframes. | ||
|
||
>>> df1 = DataFrame({'A': [0, 0], 'B': [4, 4]}) | ||
>>> df2 = DataFrame({'B': [3, 3], 'C': [-10, 1],}, index=[1, 2]) | ||
>>> df1.combine(df2, take_smaller) | ||
A B C | ||
0 NaN NaN NaN | ||
1 NaN 3.0 -10.0 | ||
2 NaN 3.0 1.0 | ||
|
||
>>> df1.combine(df2, take_smaller, overwrite=False) | ||
A B C | ||
0 0.0 NaN NaN | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. alignment on these |
||
1 0.0 3.0 -10.0 | ||
2 NaN 3.0 1.0 | ||
|
||
Demonstrating the preference of the passed in dataframe. | ||
>>> df2 = DataFrame({'B': [3, 3], 'C': [1, 1],}, index=[1, 2]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you be consistent about blank lines before an example (e.g. add one here) |
||
>>> df2.combine(df1, take_smaller) | ||
A B C | ||
0 0.0 NaN NaN | ||
1 0.0 3.0 NaN | ||
2 NaN 3.0 NaN | ||
|
||
>>> df2.combine(df1, take_smaller, overwrite=False) | ||
A B C | ||
0 0.0 NaN NaN | ||
1 0.0 3.0 1.0 | ||
2 NaN 3.0 1.0 | ||
|
||
See Also | ||
-------- | ||
DataFrame.combine_first : Combine two DataFrame objects and default to | ||
|
@@ -4095,7 +4158,6 @@ def combine(self, other, func, fill_value=None, overwrite=True): | |
# sorts if possible | ||
new_columns = this.columns.union(other.columns) | ||
do_fill = fill_value is not None | ||
|
||
result = {} | ||
for col in new_columns: | ||
series = this[col] | ||
|
@@ -4160,27 +4222,42 @@ def combine(self, other, func, fill_value=None, overwrite=True): | |
|
||
def combine_first(self, other): | ||
""" | ||
Combine two DataFrame objects and default to non-null values in frame | ||
calling the method. Result index columns will be the union of the | ||
respective indexes and columns | ||
Update null elements with value in the same location in `other`. | ||
|
||
Combine two DataFrame objects by filling null values in self DataFrame | ||
with non-null values from other DataFrame. The row and column indexes | ||
of the resulting DataFrame will be the union of the two. | ||
|
||
Parameters | ||
---------- | ||
other : DataFrame | ||
Provided DataFrame to use to fill null values. | ||
|
||
Returns | ||
------- | ||
combined : DataFrame | ||
|
||
Examples | ||
-------- | ||
|
||
df1's values prioritized, use values from df2 to fill holes: | ||
>>> from pandas import DataFrame | ||
>>> df1 = DataFrame({'A': [None, 0], 'B': [None, 4]}) | ||
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]}) | ||
>>> df1.combine_first(df2) | ||
A B | ||
0 1.0 3.0 | ||
1 0.0 4.0 | ||
|
||
Illustrate the behavior when the axis differ between the dataframes. | ||
|
||
>>> df1 = pd.DataFrame([[1, np.nan]]) | ||
>>> df2 = pd.DataFrame([[3, 4]]) | ||
>>> df1 = DataFrame({'A': [None, 0], 'B': [4, None]}) | ||
>>> df2 = DataFrame({'B': [3, 3], 'C': [1, 1],}, index=[1, 2]) | ||
>>> df1.combine_first(df2) | ||
0 1 | ||
0 1 4.0 | ||
A B C | ||
0 NaN 4.0 NaN | ||
1 0.0 3.0 1.0 | ||
2 NaN 3.0 1.0 | ||
|
||
See Also | ||
-------- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge is a very specific word and would rather not use it here. element-wise combine is pretty descriptive