34
34
~~~~~~~~~~~~
35
35
36
36
In R you may want to split data into subsets and compute the mean for each.
37
- Using a data.frame called ``df `` and splitting it into groups ``by1 `` and
37
+ Using a data.frame called ``df `` and splitting it into groups ``by1 `` and
38
38
``by2 ``:
39
39
40
40
.. code-block :: r
@@ -69,13 +69,13 @@ For more details and examples see :ref:`the groupby documentation
69
69
|tapply |_
70
70
~~~~~~~~~
71
71
72
- ``tapply `` is similar to ``aggregate ``, but data can be in a ragged array,
72
+ ``tapply `` is similar to ``aggregate ``, but data can be in a ragged array,
73
73
since the subclass sizes are possibly irregular. Using a data.frame called
74
74
``baseball ``, and retrieving information based on the array ``team ``:
75
75
76
76
.. code-block :: r
77
77
78
- baseball <-
78
+ baseball <-
79
79
data.frame(team = gl(5, 5,
80
80
labels = paste("Team", LETTERS[1:5])),
81
81
player = sample(letters, 25),
@@ -122,7 +122,7 @@ index/slice as well as standard boolean indexing:
122
122
123
123
.. ipython :: python
124
124
125
- df = DataFrame({' a' : random.randn(10 ), ' b' : random.randn(10 )})
125
+ df = DataFrame({' a' : np. random.randn(10 ), ' b' : np. random.randn(10 )})
126
126
df.query(' a <= b' )
127
127
df[df.a <= df.b]
128
128
df.loc[df.a <= df.b]
@@ -150,7 +150,7 @@ In ``pandas`` the equivalent expression, using the
150
150
151
151
.. ipython :: python
152
152
153
- df = DataFrame({' a' : random.randn(10 ), ' b' : random.randn(10 )})
153
+ df = DataFrame({' a' : np. random.randn(10 ), ' b' : np. random.randn(10 )})
154
154
df.eval(' a + b' )
155
155
df.a + df.b # same as the previous expression
156
156
@@ -330,7 +330,7 @@ In Python the best way is to make use of :meth:`~pandas.pivot_table`:
330
330
pd.pivot_table(mdf, values = ' value' , rows = [' variable' ,' week' ],
331
331
cols = [' month' ], aggfunc = np.mean)
332
332
333
- Similarly for ``dcast `` which uses a data.frame called ``df `` in R to
333
+ Similarly for ``dcast `` which uses a data.frame called ``df `` in R to
334
334
aggregate information based on ``Animal `` and ``FeedType ``:
335
335
336
336
.. code-block :: r
@@ -341,10 +341,10 @@ aggregate information based on ``Animal`` and ``FeedType``:
341
341
FeedType = c('A', 'B', 'A', 'A', 'B', 'B', 'A'),
342
342
Amount = c(10, 7, 4, 2, 5, 6, 2)
343
343
)
344
-
345
- dcast(df, Animal ~ FeedType, sum, fill=NaN)
344
+
345
+ dcast(df, Animal ~ FeedType, sum, fill=NaN)
346
346
# Alternative method using base R
347
- with(df, tapply(Amount, list(Animal, FeedType), sum))
347
+ with(df, tapply(Amount, list(Animal, FeedType), sum))
348
348
349
349
Python can approach this in two different ways. Firstly, similar to above
350
350
using :meth: `~pandas.pivot_table `:
@@ -365,7 +365,7 @@ The second approach is to use the :meth:`~pandas.DataFrame.groupby` method:
365
365
.. ipython :: python
366
366
367
367
df.groupby([' Animal' ,' FeedType' ])[' Amount' ].sum()
368
-
368
+
369
369
For more details and examples see :ref: `the reshaping documentation
370
370
<reshaping.pivot>` or :ref: `the groupby documentation<groupby.split> `.
371
371
0 commit comments