Skip to content

DOC: Fix PEP-8 issues in computation.rst and comparison_*.rst #24002

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 2, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 43 additions & 45 deletions doc/source/comparison_with_r.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

import pandas as pd
import numpy as np
pd.options.display.max_rows=15
pd.options.display.max_rows = 15

Comparison with R / R libraries
*******************************
Expand Down Expand Up @@ -165,16 +165,15 @@ function.

.. ipython:: python

df = pd.DataFrame({
'v1': [1,3,5,7,8,3,5,np.nan,4,5,7,9],
'v2': [11,33,55,77,88,33,55,np.nan,44,55,77,99],
'by1': ["red", "blue", 1, 2, np.nan, "big", 1, 2, "red", 1, np.nan, 12],
'by2': ["wet", "dry", 99, 95, np.nan, "damp", 95, 99, "red", 99, np.nan,
np.nan]
})
df = pd.DataFrame(
{'v1': [1, 3, 5, 7, 8, 3, 5, np.nan, 4, 5, 7, 9],
'v2': [11, 33, 55, 77, 88, 33, 55, np.nan, 44, 55, 77, 99],
'by1': ["red", "blue", 1, 2, np.nan, "big", 1, 2, "red", 1, np.nan, 12],
'by2': ["wet", "dry", 99, 95, np.nan, "damp", 95, 99, "red", 99, np.nan,
np.nan]})

g = df.groupby(['by1','by2'])
g[['v1','v2']].mean()
g = df.groupby(['by1', 'by2'])
g[['v1', 'v2']].mean()

For more details and examples see :ref:`the groupby documentation
<groupby.split>`.
Expand All @@ -195,7 +194,7 @@ The :meth:`~pandas.DataFrame.isin` method is similar to R ``%in%`` operator:

.. ipython:: python

s = pd.Series(np.arange(5),dtype=np.float32)
s = pd.Series(np.arange(5), dtype=np.float32)
s.isin([2, 4])

The ``match`` function returns a vector of the positions of matches
Expand Down Expand Up @@ -234,11 +233,11 @@ In ``pandas`` we may use :meth:`~pandas.pivot_table` method to handle this:
import random
import string

baseball = pd.DataFrame({
'team': ["team %d" % (x+1) for x in range(5)]*5,
'player': random.sample(list(string.ascii_lowercase),25),
'batting avg': np.random.uniform(.200, .400, 25)
})
baseball = pd.DataFrame(
{'team': ["team %d" % (x + 1) for x in range(5)] * 5,
'player': random.sample(list(string.ascii_lowercase), 25),
'batting avg': np.random.uniform(.200, .400, 25)})

baseball.pivot_table(values='batting avg', columns='team', aggfunc=np.max)

For more details and examples see :ref:`the reshaping documentation
Expand Down Expand Up @@ -341,15 +340,13 @@ In ``pandas`` the equivalent expression, using the

.. ipython:: python

df = pd.DataFrame({
'x': np.random.uniform(1., 168., 120),
'y': np.random.uniform(7., 334., 120),
'z': np.random.uniform(1.7, 20.7, 120),
'month': [5,6,7,8]*30,
'week': np.random.randint(1,4, 120)
})
df = pd.DataFrame({'x': np.random.uniform(1., 168., 120),
'y': np.random.uniform(7., 334., 120),
'z': np.random.uniform(1.7, 20.7, 120),
'month': [5, 6, 7, 8] * 30,
'week': np.random.randint(1, 4, 120)})

grouped = df.groupby(['month','week'])
grouped = df.groupby(['month', 'week'])
grouped['x'].agg([np.mean, np.std])


Expand All @@ -374,8 +371,8 @@ In Python, since ``a`` is a list, you can simply use list comprehension.

.. ipython:: python

a = np.array(list(range(1,24))+[np.NAN]).reshape(2,3,4)
pd.DataFrame([tuple(list(x)+[val]) for x, val in np.ndenumerate(a)])
a = np.array(list(range(1, 24)) + [np.NAN]).reshape(2, 3, 4)
pd.DataFrame([tuple(list(x) + [val]) for x, val in np.ndenumerate(a)])

|meltlist|_
~~~~~~~~~~~~
Expand All @@ -393,7 +390,7 @@ In Python, this list would be a list of tuples, so

.. ipython:: python

a = list(enumerate(list(range(1,5))+[np.NAN]))
a = list(enumerate(list(range(1, 5)) + [np.NAN]))
pd.DataFrame(a)

For more details and examples see :ref:`the Into to Data Structures
Expand All @@ -419,12 +416,13 @@ In Python, the :meth:`~pandas.melt` method is the R equivalent:

.. ipython:: python

cheese = pd.DataFrame({'first' : ['John', 'Mary'],
'last' : ['Doe', 'Bo'],
'height' : [5.5, 6.0],
'weight' : [130, 150]})
cheese = pd.DataFrame({'first': ['John', 'Mary'],
'last': ['Doe', 'Bo'],
'height': [5.5, 6.0],
'weight': [130, 150]})

pd.melt(cheese, id_vars=['first', 'last'])
cheese.set_index(['first', 'last']).stack() # alternative way
cheese.set_index(['first', 'last']).stack() # alternative way

For more details and examples see :ref:`the reshaping documentation
<reshaping.melt>`.
Expand Down Expand Up @@ -452,16 +450,15 @@ In Python the best way is to make use of :meth:`~pandas.pivot_table`:

.. ipython:: python

df = pd.DataFrame({
'x': np.random.uniform(1., 168., 12),
'y': np.random.uniform(7., 334., 12),
'z': np.random.uniform(1.7, 20.7, 12),
'month': [5,6,7]*4,
'week': [1,2]*6
})
df = pd.DataFrame({'x': np.random.uniform(1., 168., 12),
'y': np.random.uniform(7., 334., 12),
'z': np.random.uniform(1.7, 20.7, 12),
'month': [5, 6, 7] * 4,
'week': [1, 2] * 6})

mdf = pd.melt(df, id_vars=['month', 'week'])
pd.pivot_table(mdf, values='value', index=['variable','week'],
columns=['month'], aggfunc=np.mean)
pd.pivot_table(mdf, values='value', index=['variable', 'week'],
columns=['month'], aggfunc=np.mean)

Similarly for ``dcast`` which uses a data.frame called ``df`` in R to
aggregate information based on ``Animal`` and ``FeedType``:
Expand Down Expand Up @@ -491,13 +488,14 @@ using :meth:`~pandas.pivot_table`:
'Amount': [10, 7, 4, 2, 5, 6, 2],
})

df.pivot_table(values='Amount', index='Animal', columns='FeedType', aggfunc='sum')
df.pivot_table(values='Amount', index='Animal', columns='FeedType',
aggfunc='sum')

The second approach is to use the :meth:`~pandas.DataFrame.groupby` method:

.. ipython:: python

df.groupby(['Animal','FeedType'])['Amount'].sum()
df.groupby(['Animal', 'FeedType'])['Amount'].sum()

For more details and examples see :ref:`the reshaping documentation
<reshaping.pivot>` or :ref:`the groupby documentation<groupby.split>`.
Expand All @@ -516,8 +514,8 @@ In pandas this is accomplished with ``pd.cut`` and ``astype("category")``:

.. ipython:: python

pd.cut(pd.Series([1,2,3,4,5,6]), 3)
pd.Series([1,2,3,2,2,3]).astype("category")
pd.cut(pd.Series([1, 2, 3, 4, 5, 6]), 3)
pd.Series([1, 2, 3, 2, 2, 3]).astype("category")

For more details and examples see :ref:`categorical introduction <categorical>` and the
:ref:`API documentation <api.categorical>`. There is also a documentation regarding the
Expand Down
20 changes: 9 additions & 11 deletions doc/source/comparison_with_sql.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ structure.

.. ipython:: python

url = 'https://raw.github.com/pandas-dev/pandas/master/pandas/tests/data/tips.csv'
url = ('https://raw.github.com/pandas-dev'
'/pandas/master/pandas/tests/data/tips.csv')
tips = pd.read_csv(url)
tips.head()

Expand Down Expand Up @@ -387,7 +388,7 @@ Top N rows with offset

.. ipython:: python

tips.nlargest(10+5, columns='tip').tail(10)
tips.nlargest(10 + 5, columns='tip').tail(10)

Top N rows per group
~~~~~~~~~~~~~~~~~~~~
Expand All @@ -411,8 +412,7 @@ Top N rows per group
.groupby(['day'])
.cumcount() + 1)
.query('rn < 3')
.sort_values(['day','rn'])
)
.sort_values(['day', 'rn']))

the same using `rank(method='first')` function

Expand All @@ -421,8 +421,7 @@ the same using `rank(method='first')` function
(tips.assign(rnk=tips.groupby(['day'])['total_bill']
.rank(method='first', ascending=False))
.query('rnk < 3')
.sort_values(['day','rnk'])
)
.sort_values(['day', 'rnk']))

.. code-block:: sql

Expand All @@ -445,11 +444,10 @@ Notice that when using ``rank(method='min')`` function
.. ipython:: python

(tips[tips['tip'] < 2]
.assign(rnk_min=tips.groupby(['sex'])['tip']
.rank(method='min'))
.query('rnk_min < 3')
.sort_values(['sex','rnk_min'])
)
.assign(rnk_min=tips.groupby(['sex'])['tip']
.rank(method='min'))
.query('rnk_min < 3')
.sort_values(['sex', 'rnk_min']))


UPDATE
Expand Down
23 changes: 11 additions & 12 deletions doc/source/comparison_with_stata.rst
Original file line number Diff line number Diff line change
Expand Up @@ -102,9 +102,7 @@ and the values are the data.

.. ipython:: python

df = pd.DataFrame({
'x': [1, 3, 5],
'y': [2, 4, 6]})
df = pd.DataFrame({'x': [1, 3, 5], 'y': [2, 4, 6]})
df


Expand All @@ -128,7 +126,8 @@ the data set if presented with a url.

.. ipython:: python

url = 'https://raw.github.com/pandas-dev/pandas/master/pandas/tests/data/tips.csv'
url = ('https://raw.github.com/pandas-dev'
'/pandas/master/pandas/tests/data/tips.csv')
tips = pd.read_csv(url)
tips.head()

Expand Down Expand Up @@ -278,17 +277,17 @@ see the :ref:`timeseries documentation<timeseries>` for more details.
tips['date1_year'] = tips['date1'].dt.year
tips['date2_month'] = tips['date2'].dt.month
tips['date1_next'] = tips['date1'] + pd.offsets.MonthBegin()
tips['months_between'] = (tips['date2'].dt.to_period('M') -
tips['date1'].dt.to_period('M'))
tips['months_between'] = (tips['date2'].dt.to_period('M')
- tips['date1'].dt.to_period('M'))

tips[['date1','date2','date1_year','date2_month',
'date1_next','months_between']].head()
tips[['date1', 'date2', 'date1_year', 'date2_month', 'date1_next',
'months_between']].head()

.. ipython:: python
:suppress:

tips = tips.drop(['date1','date2','date1_year',
'date2_month','date1_next','months_between'], axis=1)
tips = tips.drop(['date1', 'date2', 'date1_year', 'date2_month',
'date1_next', 'months_between'], axis=1)

Selection of Columns
~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -472,7 +471,7 @@ The following tables will be used in the merge examples
'value': np.random.randn(4)})
df1
df2 = pd.DataFrame({'key': ['B', 'D', 'D', 'E'],
'value': np.random.randn(4)})
'value': np.random.randn(4)})
df2

In Stata, to perform a merge, one data set must be in memory
Expand Down Expand Up @@ -661,7 +660,7 @@ In pandas this would be written as:

.. ipython:: python

tips.groupby(['sex','smoker']).first()
tips.groupby(['sex', 'smoker']).first()


Other Considerations
Expand Down
Loading