Skip to content

DOC: update the DataFrame.combine and DataFrame.combine_first docstrings #20237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jul 7, 2018
Merged
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 92 additions & 15 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -4049,33 +4049,96 @@ def _compare(a, b):

def combine(self, other, func, fill_value=None, overwrite=True):
"""
Add two DataFrame objects and do not propagate NaN values, so if for a
(column, time) one frame is missing a value, it will default to the
other frame's value (which might be NaN as well)
Perform series-wise combine with `other` DataFrame using given `func`.

Combines `self` DataFrame with `other` DataFrame using `func`
to merge columns. The row and column indexes of the resulting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge is a very specific word and would rather not use it here. element-wise combine is pretty descriptive

DataFrame will be the union of the two. If `fill_value` is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are describing the parameters here (and they are described below). I think your first 2 sentences are pretty good.

specified, that value will be filled prior to the call to
`func`. If `overwrite` is `False`, columns in `self` that
do not exist in `other` will be preserved.

Parameters
----------
other : DataFrame
The DataFrame to merge column-wise.
func : function
Function that takes two series as inputs and return a Series or a
scalar
fill_value : scalar value
scalar, used to merge the two dataframes column by columns.
fill_value : scalar value, default None
The value to fill NaNs with prior to passing any column to the
merge func.
overwrite : boolean, default True
If True then overwrite values for common keys in the calling frame
If True, columns in `self` that do not exist in `other` will be
overwritten with NaNs.

Returns
-------
result : DataFrame

Examples
--------
Combine using a simple function that chooses the smaller column.
>>> from pandas import DataFrame
>>> df1 = DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, lambda s1, s2: s1 if s1.sum() < s2.sum() else s2)
>>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would also show something like:

In [14]: df1 = DataFrame({'A': [0, 0], 'B': [4, 4]})
    ...: df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
    ...: 

In [15]: df1.combine(df2, np.minimum)
Out[15]: 
   A  B
0  0  3
1  0  3

>>> df1.combine(df2, take_smaller)
A B
0 0 3
1 0 3

Using `fill_value` fills Nones prior to passing the column to the
merge function.

>>> df1 = DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
A B
0 0 -5.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be aligned

In [18]: df1 = DataFrame({'A': [0, 0], 'B': [None, 4]})
    ...: df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
    ...: df1.combine(df2, take_smaller, fill_value=-5)
    ...: 
Out[18]: 
   A    B
0  0 -5.0
1  0  4.0

1 0 4.0

However, if the same element in both dataframes is None, that None
is preserved

>>> df1 = DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [None, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
A B
0 0 NaN
1 0 3.0

Example that demonstrates the use of `overwrite` and behavior when
the axis differ between the dataframes.

>>> df1 = DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = DataFrame({'B': [3, 3], 'C': [-10, 1],}, index=[1, 2])
>>> df1.combine(df2, take_smaller)
A B C
0 NaN NaN NaN
1 NaN 3.0 -10.0
2 NaN 3.0 1.0

>>> df1.combine(df2, take_smaller, overwrite=False)
A B C
0 0.0 NaN NaN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alignment on these

1 0.0 3.0 -10.0
2 NaN 3.0 1.0

Demonstrating the preference of the passed in dataframe.
>>> df2 = DataFrame({'B': [3, 3], 'C': [1, 1],}, index=[1, 2])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you be consistent about blank lines before an example (e.g. add one here)

>>> df2.combine(df1, take_smaller)
A B C
0 0.0 NaN NaN
1 0.0 3.0 NaN
2 NaN 3.0 NaN

>>> df2.combine(df1, take_smaller, overwrite=False)
A B C
0 0.0 NaN NaN
1 0.0 3.0 1.0
2 NaN 3.0 1.0

See Also
--------
DataFrame.combine_first : Combine two DataFrame objects and default to
Expand All @@ -4095,7 +4158,6 @@ def combine(self, other, func, fill_value=None, overwrite=True):
# sorts if possible
new_columns = this.columns.union(other.columns)
do_fill = fill_value is not None

result = {}
for col in new_columns:
series = this[col]
Expand Down Expand Up @@ -4160,27 +4222,42 @@ def combine(self, other, func, fill_value=None, overwrite=True):

def combine_first(self, other):
"""
Combine two DataFrame objects and default to non-null values in frame
calling the method. Result index columns will be the union of the
respective indexes and columns
Update null elements with value in the same location in `other`.

Combine two DataFrame objects by filling null values in self DataFrame
with non-null values from other DataFrame. The row and column indexes
of the resulting DataFrame will be the union of the two.

Parameters
----------
other : DataFrame
Provided DataFrame to use to fill null values.

Returns
-------
combined : DataFrame

Examples
--------

df1's values prioritized, use values from df2 to fill holes:
>>> from pandas import DataFrame
>>> df1 = DataFrame({'A': [None, 0], 'B': [None, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine_first(df2)
A B
0 1.0 3.0
1 0.0 4.0

Illustrate the behavior when the axis differ between the dataframes.

>>> df1 = pd.DataFrame([[1, np.nan]])
>>> df2 = pd.DataFrame([[3, 4]])
>>> df1 = DataFrame({'A': [None, 0], 'B': [4, None]})
>>> df2 = DataFrame({'B': [3, 3], 'C': [1, 1],}, index=[1, 2])
>>> df1.combine_first(df2)
0 1
0 1 4.0
A B C
0 NaN 4.0 NaN
1 0.0 3.0 1.0
2 NaN 3.0 1.0

See Also
--------
Expand Down