Skip to content

DOC: update the DataFrame.combine and DataFrame.combine_first docstrings #20237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jul 7, 2018

Conversation

Michael-J-Ward
Copy link
Contributor

@Michael-J-Ward Michael-J-Ward commented Mar 10, 2018

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
##################### Docstring (pandas.DataFrame.combine) #####################
################################################################################

Perform series-wise combine with `other` DataFrame using given `func`.

Combines `self` DataFrame with `other` DataFrame using `func`
to merge columns. The row and column indexes of the resulting
DataFrame will be the union of the two. If `fill_value` is
specified, that value will be filled prior to the call to
`func`. If `overwrite` is `False`, columns in `self` that
do not exist in `other` will be preserved.

Parameters
----------
other : DataFrame
    The DataFrame to merge column-wise.
func : function
    Function that takes two series as inputs and return a Series or a
    scalar, used to merge the two dataframes column by columns.
fill_value : scalar value, default None
    The value to fill NaNs with prior to passing any column to the
    merge func.
overwrite : boolean, default True
    If True, columns in `self` that do not exist in `other` will be
    overwritten with NaNs.

Returns
-------
result : DataFrame

Examples
--------
Combine using a simple function that chooses the smaller column.

>>> from pandas import DataFrame
>>> df1 = DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2
>>> df1.combine(df2, take_smaller)
   A  B
0  0  3
1  0  3

Using `fill_value` fills Nones prior to passing the column to the
merge function.

>>> df1 = DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
   A  B
0  0  -5.0
1  0  4.0

However, if the same element in both dataframes is None, that None
is preserved

>>> df1 = DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [None, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
   A  B
0  0  NaN
1  0  3.0

Example that demonstrates the use of `overwrite` and behavior when
the axis differ between the dataframes.

>>> df1 = DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = DataFrame({'B': [3, 3], 'C': [-10, 1],}, index=[1, 2])
>>> df1.combine(df2, take_smaller)
   A    B    C
0  NaN  NaN  NaN
1  NaN  3.0  -10.0
2  NaN  3.0  1.0

>>> df1.combine(df2, take_smaller, overwrite=False)
   A    B    C
0  0.0  NaN  NaN
1  0.0  3.0  -10.0
2  NaN  3.0  1.0

Demonstrating the preference of the passed in dataframe.

>>> df2 = DataFrame({'B': [3, 3], 'C': [1, 1],}, index=[1, 2])
>>> df2.combine(df1, take_smaller)
   A    B   C
0  0.0  NaN NaN
1  0.0  3.0 NaN
2  NaN  3.0 NaN

>>> df2.combine(df1, take_smaller, overwrite=False)
   A    B   C
0  0.0  NaN NaN
1  0.0  3.0 1.0
2  NaN  3.0 1.0

See Also
--------
DataFrame.combine_first : Combine two DataFrame objects and default to
    non-null values in frame calling the method

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.combine" correct. :)

For DataFrame.combine_first


################################################################################
################## Docstring (pandas.DataFrame.combine_first) ##################
################################################################################

Update null elements with value in the same location in `other`.

Combine two DataFrame objects by filling null values in self DataFrame
with non-null values from other DataFrame. The row and column indexes
of the resulting DataFrame will be the union of the two.

Parameters
----------
other : DataFrame
    Provided DataFrame to use to fill null values.

Returns
-------
combined : DataFrame

Examples
--------

df1's values prioritized, use values from df2 to fill holes:

>>> from pandas import DataFrame
>>> df1 = DataFrame({'A': [None, 0], 'B': [None, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine_first(df2)
   A  B
0  1.0  3.0
1  0.0  4.0

Illustrate the behavior when the axis differ between the dataframes.

>>> df1 = DataFrame({'A': [None, 0], 'B': [4, None]})
>>> df2 = DataFrame({'B': [3, 3], 'C': [1, 1],}, index=[1, 2])
>>> df1.combine_first(df2)
   A    B    C
0  NaN  4.0  NaN
1  0.0  3.0  1.0
2  NaN  3.0  1.0

See Also
--------
DataFrame.combine : Perform series-wise operation on two DataFrames
    using a given function

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.combine_first" correct. :)


@Michael-J-Ward
Copy link
Contributor Author

Would to like to give recognition to @qshng who worked on this documentation with me.

Perform series-wise combine with `other` DataFrame using given `func`.

Combines `self` DataFrame with `other` DataFrame using `func`
to merge columns. The row and column indexes of the resulting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge is a very specific word and would rather not use it here. element-wise combine is pretty descriptive


Combines `self` DataFrame with `other` DataFrame using `func`
to merge columns. The row and column indexes of the resulting
DataFrame will be the union of the two. If `fill_value` is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are describing the parameters here (and they are described below). I think your first 2 sentences are pretty good.

>>> df1 = DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, lambda s1, s2: s1 if s1.sum() < s2.sum() else s2)
>>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would also show something like:

In [14]: df1 = DataFrame({'A': [0, 0], 'B': [4, 4]})
    ...: df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
    ...: 

In [15]: df1.combine(df2, np.minimum)
Out[15]: 
   A  B
0  0  3
1  0  3

>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
A B
0 0 -5.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be aligned

In [18]: df1 = DataFrame({'A': [0, 0], 'B': [None, 4]})
    ...: df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
    ...: df1.combine(df2, take_smaller, fill_value=-5)
    ...: 
Out[18]: 
   A    B
0  0 -5.0
1  0  4.0

2 NaN 3.0 1.0

Demonstrating the preference of the passed in dataframe.
>>> df2 = DataFrame({'B': [3, 3], 'C': [1, 1],}, index=[1, 2])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you be consistent about blank lines before an example (e.g. add one here)


>>> df1.combine(df2, take_smaller, overwrite=False)
A B C
0 0.0 NaN NaN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alignment on these

@jreback jreback added Docs Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Mar 10, 2018
@Michael-J-Ward
Copy link
Contributor Author

The validate_docstring output for DataFrame.combine

################################################################################
##################### Docstring (pandas.DataFrame.combine) #####################
################################################################################

Perform series-wise combine with `other` DataFrame using given `func`.

Combines `self` DataFrame with `other` DataFrame using `func`
to element-wise combine columns. The row and column indexes of the
resulting DataFrame will be the union of the two.

Parameters
----------
other : DataFrame
    The DataFrame to merge column-wise.
func : function
    Function that takes two series as inputs and return a Series or a
    scalar, used to merge the two dataframes column by columns.
fill_value : scalar value, default None
    The value to fill NaNs with prior to passing any column to the
    merge func.
overwrite : boolean, default True
    If True, columns in `self` that do not exist in `other` will be
    overwritten with NaNs.

Returns
-------
result : DataFrame

Examples
--------
Combine using a simple function that chooses the smaller column.

>>> from pandas import DataFrame
>>> import numpy as np
>>> df1 = DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2
>>> df1.combine(df2, take_smaller)
   A  B
0  0  3
1  0  3

Example  using a true element-wise combine function.

>>> import numpy as np
>>> df1 = DataFrame({'A': [5, 0], 'B': [2, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, np.minimum)
   A  B
0  1  2
1  0  3

Using `fill_value` fills Nones prior to passing the column to the
merge function.

>>> df1 = DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
   A    B
0  0 -5.0
1  0  4.0

However, if the same element in both dataframes is None, that None
is preserved

>>> df1 = DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [None, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
   A    B
0  0  NaN
1  0  3.0

Example that demonstrates the use of `overwrite` and behavior when
the axis differ between the dataframes.

>>> df1 = DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = DataFrame({'B': [3, 3], 'C': [-10, 1],}, index=[1, 2])
>>> df1.combine(df2, take_smaller)
     A    B     C
0  NaN  NaN   NaN
1  NaN  3.0 -10.0
2  NaN  3.0   1.0

>>> df1.combine(df2, take_smaller, overwrite=False)
     A    B     C
0  0.0  NaN   NaN
1  0.0  3.0 -10.0
2  NaN  3.0   1.0

Demonstrating the preference of the passed in dataframe.

>>> df2 = DataFrame({'B': [3, 3], 'C': [1, 1],}, index=[1, 2])
>>> df2.combine(df1, take_smaller)
   A    B   C
0  0.0  NaN NaN
1  0.0  3.0 NaN
2  NaN  3.0 NaN

>>> df2.combine(df1, take_smaller, overwrite=False)
     A    B   C
0  0.0  NaN NaN
1  0.0  3.0 1.0
2  NaN  3.0 1.0

See Also
--------
DataFrame.combine_first : Combine two DataFrame objects and default to
    non-null values in frame calling the method

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.combine" correct. :)

And for combine_first

################################################################################
################## Docstring (pandas.DataFrame.combine_first) ##################
################################################################################

Update null elements with value in the same location in `other`.

Combine two DataFrame objects by filling null values in self DataFrame
with non-null values from other DataFrame. The row and column indexes
of the resulting DataFrame will be the union of the two.

Parameters
----------
other : DataFrame
    Provided DataFrame to use to fill null values.

Returns
-------
combined : DataFrame

Examples
--------

df1's values prioritized, use values from df2 to fill holes:

>>> from pandas import DataFrame
>>> df1 = DataFrame({'A': [None, 0], 'B': [None, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine_first(df2)
     A    B
0  1.0  3.0
1  0.0  4.0

Illustrate the behavior when the axis differ between the dataframes.

>>> df1 = DataFrame({'A': [None, 0], 'B': [4, None]})
>>> df2 = DataFrame({'B': [3, 3], 'C': [1, 1],}, index=[1, 2])
>>> df1.combine_first(df2)
     A    B    C
0  NaN  4.0  NaN
1  0.0  3.0  1.0
2  NaN  3.0  1.0

See Also
--------
DataFrame.combine : Perform series-wise operation on two DataFrames
    using a given function

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.combine_first" correct. :)

@pep8speaks
Copy link

pep8speaks commented Jul 7, 2018

Hello @Michael-J-Ward! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on July 07, 2018 at 19:58 Hours UTC

@jreback jreback added this to the 0.24.0 milestone Jul 7, 2018
@mroeschke
Copy link
Member

Thanks @Michael-J-Ward!

(Travis error was from a prior lint error on master)

@mroeschke mroeschke merged commit c71b46a into pandas-dev:master Jul 7, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
…ngs (pandas-dev#20237)

* Added summary to `DataFrame.combine`. Corrected the extended summary. Added descriptions to parameters. Added examples to demonstrate quirks in usage.

* Added summary to `DataFrame.combine`. Corrected the extended summary. Added descriptions to parameters. Added examples to demonstrate quirks in usage.

* Added short summary to  and added examples to demonstrate behavior.

* pep8 formatting for the docstrings

* updated doctests so that they all pass for Dataframe.combine and Dataframe.combine_first

* updated docstrings on DataFrame.combine and DataFrame.combine_first for proper HTML formatting.

* updated output alignment and removed term merge from combine docstring- addressing review comments

* remove unneeded files and some edits

* forgot some pd

* flake8 and edit combine_first
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants