DOC: update the DataFrame.combine and DataFrame.combine_first docstrings #20237

Michael-J-Ward · 2018-03-10T21:04:46Z

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single <your-function-or-method>
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
##################### Docstring (pandas.DataFrame.combine) #####################
################################################################################

Perform series-wise combine with `other` DataFrame using given `func`.

Combines `self` DataFrame with `other` DataFrame using `func`
to merge columns. The row and column indexes of the resulting
DataFrame will be the union of the two. If `fill_value` is
specified, that value will be filled prior to the call to
`func`. If `overwrite` is `False`, columns in `self` that
do not exist in `other` will be preserved.

Parameters
----------
other : DataFrame
    The DataFrame to merge column-wise.
func : function
    Function that takes two series as inputs and return a Series or a
    scalar, used to merge the two dataframes column by columns.
fill_value : scalar value, default None
    The value to fill NaNs with prior to passing any column to the
    merge func.
overwrite : boolean, default True
    If True, columns in `self` that do not exist in `other` will be
    overwritten with NaNs.

Returns
-------
result : DataFrame

Examples
--------
Combine using a simple function that chooses the smaller column.

>>> from pandas import DataFrame
>>> df1 = DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2
>>> df1.combine(df2, take_smaller)
   A  B
0  0  3
1  0  3

Using `fill_value` fills Nones prior to passing the column to the
merge function.

>>> df1 = DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
   A  B
0  0  -5.0
1  0  4.0

However, if the same element in both dataframes is None, that None
is preserved

>>> df1 = DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [None, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
   A  B
0  0  NaN
1  0  3.0

Example that demonstrates the use of `overwrite` and behavior when
the axis differ between the dataframes.

>>> df1 = DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = DataFrame({'B': [3, 3], 'C': [-10, 1],}, index=[1, 2])
>>> df1.combine(df2, take_smaller)
   A    B    C
0  NaN  NaN  NaN
1  NaN  3.0  -10.0
2  NaN  3.0  1.0

>>> df1.combine(df2, take_smaller, overwrite=False)
   A    B    C
0  0.0  NaN  NaN
1  0.0  3.0  -10.0
2  NaN  3.0  1.0

Demonstrating the preference of the passed in dataframe.

>>> df2 = DataFrame({'B': [3, 3], 'C': [1, 1],}, index=[1, 2])
>>> df2.combine(df1, take_smaller)
   A    B   C
0  0.0  NaN NaN
1  0.0  3.0 NaN
2  NaN  3.0 NaN

>>> df2.combine(df1, take_smaller, overwrite=False)
   A    B   C
0  0.0  NaN NaN
1  0.0  3.0 1.0
2  NaN  3.0 1.0

See Also
--------
DataFrame.combine_first : Combine two DataFrame objects and default to
    non-null values in frame calling the method

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.combine" correct. :)

For DataFrame.combine_first


################################################################################
################## Docstring (pandas.DataFrame.combine_first) ##################
################################################################################

Update null elements with value in the same location in `other`.

Combine two DataFrame objects by filling null values in self DataFrame
with non-null values from other DataFrame. The row and column indexes
of the resulting DataFrame will be the union of the two.

Parameters
----------
other : DataFrame
    Provided DataFrame to use to fill null values.

Returns
-------
combined : DataFrame

Examples
--------

df1's values prioritized, use values from df2 to fill holes:

>>> from pandas import DataFrame
>>> df1 = DataFrame({'A': [None, 0], 'B': [None, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine_first(df2)
   A  B
0  1.0  3.0
1  0.0  4.0

Illustrate the behavior when the axis differ between the dataframes.

>>> df1 = DataFrame({'A': [None, 0], 'B': [4, None]})
>>> df2 = DataFrame({'B': [3, 3], 'C': [1, 1],}, index=[1, 2])
>>> df1.combine_first(df2)
   A    B    C
0  NaN  4.0  NaN
1  0.0  3.0  1.0
2  NaN  3.0  1.0

See Also
--------
DataFrame.combine : Perform series-wise operation on two DataFrames
    using a given function

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.combine_first" correct. :)

… Added descriptions to parameters. Added examples to demonstrate quirks in usage.

…ndas into document_frame_combine

…frame.combine_first

…or proper HTML formatting.

Michael-J-Ward · 2018-03-10T21:09:12Z

Would to like to give recognition to @qshng who worked on this documentation with me.

jreback · 2018-03-10T21:18:14Z

pandas/core/frame.py

+        Perform series-wise combine with `other` DataFrame using given `func`.
+
+        Combines `self` DataFrame with `other` DataFrame using `func`
+        to merge columns. The row and column indexes of the resulting


merge is a very specific word and would rather not use it here. element-wise combine is pretty descriptive

jreback · 2018-03-10T21:19:07Z

pandas/core/frame.py

+
+        Combines `self` DataFrame with `other` DataFrame using `func`
+        to merge columns. The row and column indexes of the resulting
+        DataFrame will be the union of the two. If `fill_value` is


you are describing the parameters here (and they are described below). I think your first 2 sentences are pretty good.

jreback · 2018-03-10T21:20:47Z

pandas/core/frame.py

        >>> df1 = DataFrame({'A': [0, 0], 'B': [4, 4]})
        >>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
-        >>> df1.combine(df2, lambda s1, s2: s1 if s1.sum() < s2.sum() else s2)
+        >>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2


would also show something like:

In [14]: df1 = DataFrame({'A': [0, 0], 'B': [4, 4]}) ...: df2 = DataFrame({'A': [1, 1], 'B': [3, 3]}) ...: In [15]: df1.combine(df2, np.minimum) Out[15]: A B 0 0 3 1 0 3

jreback · 2018-03-10T21:22:05Z

pandas/core/frame.py

+        >>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
+        >>> df1.combine(df2, take_smaller, fill_value=-5)
+           A  B
+        0  0  -5.0


these should be aligned

In [18]: df1 = DataFrame({'A': [0, 0], 'B': [None, 4]}) ...: df2 = DataFrame({'A': [1, 1], 'B': [3, 3]}) ...: df1.combine(df2, take_smaller, fill_value=-5) ...: Out[18]: A B 0 0 -5.0 1 0 4.0

jreback · 2018-03-10T21:22:30Z

pandas/core/frame.py

+        2  NaN  3.0  1.0
+
+        Demonstrating the preference of the passed in dataframe.
+        >>> df2 = DataFrame({'B': [3, 3], 'C': [1, 1],}, index=[1, 2])


can you be consistent about blank lines before an example (e.g. add one here)

jreback · 2018-03-10T21:22:39Z

pandas/core/frame.py

+
+        >>> df1.combine(df2, take_smaller, overwrite=False)
+           A    B    C
+        0  0.0  NaN  NaN


alignment on these

…g- addressing review comments

Michael-J-Ward · 2018-03-11T13:55:59Z

The validate_docstring output for DataFrame.combine

################################################################################
##################### Docstring (pandas.DataFrame.combine) #####################
################################################################################

Perform series-wise combine with `other` DataFrame using given `func`.

Combines `self` DataFrame with `other` DataFrame using `func`
to element-wise combine columns. The row and column indexes of the
resulting DataFrame will be the union of the two.

Parameters
----------
other : DataFrame
    The DataFrame to merge column-wise.
func : function
    Function that takes two series as inputs and return a Series or a
    scalar, used to merge the two dataframes column by columns.
fill_value : scalar value, default None
    The value to fill NaNs with prior to passing any column to the
    merge func.
overwrite : boolean, default True
    If True, columns in `self` that do not exist in `other` will be
    overwritten with NaNs.

Returns
-------
result : DataFrame

Examples
--------
Combine using a simple function that chooses the smaller column.

>>> from pandas import DataFrame
>>> import numpy as np
>>> df1 = DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2
>>> df1.combine(df2, take_smaller)
   A  B
0  0  3
1  0  3

Example  using a true element-wise combine function.

>>> import numpy as np
>>> df1 = DataFrame({'A': [5, 0], 'B': [2, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, np.minimum)
   A  B
0  1  2
1  0  3

Using `fill_value` fills Nones prior to passing the column to the
merge function.

>>> df1 = DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
   A    B
0  0 -5.0
1  0  4.0

However, if the same element in both dataframes is None, that None
is preserved

>>> df1 = DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [None, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
   A    B
0  0  NaN
1  0  3.0

Example that demonstrates the use of `overwrite` and behavior when
the axis differ between the dataframes.

>>> df1 = DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = DataFrame({'B': [3, 3], 'C': [-10, 1],}, index=[1, 2])
>>> df1.combine(df2, take_smaller)
     A    B     C
0  NaN  NaN   NaN
1  NaN  3.0 -10.0
2  NaN  3.0   1.0

>>> df1.combine(df2, take_smaller, overwrite=False)
     A    B     C
0  0.0  NaN   NaN
1  0.0  3.0 -10.0
2  NaN  3.0   1.0

Demonstrating the preference of the passed in dataframe.

>>> df2 = DataFrame({'B': [3, 3], 'C': [1, 1],}, index=[1, 2])
>>> df2.combine(df1, take_smaller)
   A    B   C
0  0.0  NaN NaN
1  0.0  3.0 NaN
2  NaN  3.0 NaN

>>> df2.combine(df1, take_smaller, overwrite=False)
     A    B   C
0  0.0  NaN NaN
1  0.0  3.0 1.0
2  NaN  3.0 1.0

See Also
--------
DataFrame.combine_first : Combine two DataFrame objects and default to
    non-null values in frame calling the method

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.combine" correct. :)

And for combine_first

################################################################################
################## Docstring (pandas.DataFrame.combine_first) ##################
################################################################################

Update null elements with value in the same location in `other`.

Combine two DataFrame objects by filling null values in self DataFrame
with non-null values from other DataFrame. The row and column indexes
of the resulting DataFrame will be the union of the two.

Parameters
----------
other : DataFrame
    Provided DataFrame to use to fill null values.

Returns
-------
combined : DataFrame

Examples
--------

df1's values prioritized, use values from df2 to fill holes:

>>> from pandas import DataFrame
>>> df1 = DataFrame({'A': [None, 0], 'B': [None, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine_first(df2)
     A    B
0  1.0  3.0
1  0.0  4.0

Illustrate the behavior when the axis differ between the dataframes.

>>> df1 = DataFrame({'A': [None, 0], 'B': [4, None]})
>>> df2 = DataFrame({'B': [3, 3], 'C': [1, 1],}, index=[1, 2])
>>> df1.combine_first(df2)
     A    B    C
0  NaN  4.0  NaN
1  0.0  3.0  1.0
2  NaN  3.0  1.0

See Also
--------
DataFrame.combine : Perform series-wise operation on two DataFrames
    using a given function

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.combine_first" correct. :)

…mbine

pep8speaks · 2018-07-07T19:29:06Z

Hello @Michael-J-Ward! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on July 07, 2018 at 19:58 Hours UTC

…mbine

mroeschke · 2018-07-07T22:17:24Z

Thanks @Michael-J-Ward!

(Travis error was from a prior lint error on master)

…ngs (pandas-dev#20237) * Added summary to `DataFrame.combine`. Corrected the extended summary. Added descriptions to parameters. Added examples to demonstrate quirks in usage. * Added summary to `DataFrame.combine`. Corrected the extended summary. Added descriptions to parameters. Added examples to demonstrate quirks in usage. * Added short summary to and added examples to demonstrate behavior. * pep8 formatting for the docstrings * updated doctests so that they all pass for Dataframe.combine and Dataframe.combine_first * updated docstrings on DataFrame.combine and DataFrame.combine_first for proper HTML formatting. * updated output alignment and removed term merge from combine docstring- addressing review comments * remove unneeded files and some edits * forgot some pd * flake8 and edit combine_first

Michael-J-Ward added 7 commits March 10, 2018 13:40

Added summary to DataFrame.combine. Corrected the extended summary.…

414c8d8

… Added descriptions to parameters. Added examples to demonstrate quirks in usage.

Added summary to DataFrame.combine. Corrected the extended summary.…

de10138

… Added descriptions to parameters. Added examples to demonstrate quirks in usage.

Merge branch 'document_frame_combine' of github.com:Michael-J-Ward/pa…

2f9b030

…ndas into document_frame_combine

Added short summary to and added examples to demonstrate behavior.

c2f4b56

pep8 formatting for the docstrings

596535e

updated doctests so that they all pass for Dataframe.combine and Data…

ba7af38

…frame.combine_first

updated docstrings on DataFrame.combine and DataFrame.combine_first f…

f3b8051

…or proper HTML formatting.

jreback requested changes Mar 10, 2018

View reviewed changes

jreback added Docs Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Mar 10, 2018

updated output alignment and removed term merge from combine docstrin…

29833a3

…g- addressing review comments

Matt Roeschke added 2 commits July 7, 2018 14:16

Merge remote-tracking branch 'upstream/master' into document_frame_co…

1618a36

…mbine

remove unneeded files and some edits

5cc5856

Matt Roeschke added 2 commits July 7, 2018 14:31

forgot some pd

1c7aff9

flake8 and edit combine_first

fbc3207

jreback added this to the 0.24.0 milestone Jul 7, 2018

jreback approved these changes Jul 7, 2018

View reviewed changes

Merge remote-tracking branch 'upstream/master' into document_frame_co…

e969451

…mbine

mroeschke merged commit c71b46a into pandas-dev:master Jul 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DOC: update the DataFrame.combine and DataFrame.combine_first docstrings #20237

DOC: update the DataFrame.combine and DataFrame.combine_first docstrings #20237

Uh oh!

Michael-J-Ward commented Mar 10, 2018 •

edited

Loading

Uh oh!

Michael-J-Ward commented Mar 10, 2018

Uh oh!

jreback Mar 10, 2018

Uh oh!

jreback Mar 10, 2018

Uh oh!

jreback Mar 10, 2018

Uh oh!

jreback Mar 10, 2018

Uh oh!

jreback Mar 10, 2018

Uh oh!

jreback Mar 10, 2018

Uh oh!

Michael-J-Ward commented Mar 11, 2018

Uh oh!

pep8speaks commented Jul 7, 2018 •

edited

Loading

Uh oh!

mroeschke commented Jul 7, 2018

Uh oh!

Uh oh!

Uh oh!

DOC: update the DataFrame.combine and DataFrame.combine_first docstrings #20237

DOC: update the DataFrame.combine and DataFrame.combine_first docstrings #20237

Uh oh!

Conversation

Michael-J-Ward commented Mar 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Michael-J-Ward commented Mar 10, 2018

Uh oh!

jreback Mar 10, 2018

Choose a reason for hiding this comment

Uh oh!

jreback Mar 10, 2018

Choose a reason for hiding this comment

Uh oh!

jreback Mar 10, 2018

Choose a reason for hiding this comment

Uh oh!

jreback Mar 10, 2018

Choose a reason for hiding this comment

Uh oh!

jreback Mar 10, 2018

Choose a reason for hiding this comment

Uh oh!

jreback Mar 10, 2018

Choose a reason for hiding this comment

Uh oh!

Michael-J-Ward commented Mar 11, 2018

Uh oh!

pep8speaks commented Jul 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on July 07, 2018 at 19:58 Hours UTC

Uh oh!

mroeschke commented Jul 7, 2018

Uh oh!

Uh oh!

Michael-J-Ward commented Mar 10, 2018 •

edited

Loading

pep8speaks commented Jul 7, 2018 •

edited

Loading