Skip to content

DOC: Improve DataFrame.dropna subset example #35337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 29, 2020

Conversation

TyMick
Copy link
Contributor

@TyMick TyMick commented Jul 18, 2020

In the examples for DataFrame.dropna, which begin like so:

>>> df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
...                    "toy": [np.nan, 'Batmobile', 'Bullwhip'],
...                    "born": [pd.NaT, pd.Timestamp("1940-04-25"),
...                             pd.NaT]})
>>> df
       name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

the example that uses the subset parameter,

>>> df.dropna(subset=['name', 'born'])
       name        toy       born
1    Batman  Batmobile 1940-04-25

yields the same output as the example that doesn't pass in any arguments,

>>> df.dropna()
     name        toy       born
1  Batman  Batmobile 1940-04-25

I think changing the subset example to

>>> df.dropna(subset=['name', 'toy'])
       name        toy       born
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

would better illustrate the concept, as its result contains a row with one missing value, just in a column that wasn't specified in subset.


I'm not able to test the docstring locally at the moment, but I ran the code in my console up to the new example, and it worked as expected.

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
...                    "toy": [np.nan, 'Batmobile', 'Bullwhip'],
...                    "born": [pd.NaT, pd.Timestamp("1940-04-25"),
...                             pd.NaT]})
>>> df
       name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT
>>> df.dropna()
     name        toy       born
1  Batman  Batmobile 1940-04-25
>>> df.dropna(axis='columns')
       name
0    Alfred
1    Batman
2  Catwoman
>>> df.dropna(how='all')
       name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT
>>> df.dropna(thresh=2)
       name        toy       born
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT
>>> df.dropna(subset=['name', 'toy'])
       name        toy       born
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tywmick for the PR. lgtm pending green

@simonjayhawkins simonjayhawkins added this to the 1.2 milestone Jul 18, 2020
@simonjayhawkins simonjayhawkins merged commit 725900f into pandas-dev:master Jul 29, 2020
@simonjayhawkins
Copy link
Member

Thanks @tywmick

@TyMick TyMick deleted the dropna-doc-example branch July 31, 2020 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants