@@ -30,6 +30,77 @@ R packages.
30
30
Base R
31
31
------
32
32
33
+ |aggregate |_
34
+ ~~~~~~~~~~~~
35
+
36
+ In R you may want to split data into subsets and compute the mean for each.
37
+ Using a data.frame called ``df `` and splitting it into groups ``by1 `` and
38
+ ``by2 ``:
39
+
40
+ .. code-block :: r
41
+
42
+ df <- data.frame(
43
+ v1 = c(1,3,5,7,8,3,5,NA,4,5,7,9),
44
+ v2 = c(11,33,55,77,88,33,55,NA,44,55,77,99),
45
+ by1 = c("red", "blue", 1, 2, NA, "big", 1, 2, "red", 1, NA, 12),
46
+ by2 = c("wet", "dry", 99, 95, NA, "damp", 95, 99, "red", 99, NA, NA))
47
+ aggregate(x=df[, c("v1", "v2")], by=list(mydf2$by1, mydf2$by2), FUN = mean)
48
+
49
+ The :meth: `~pandas.DataFrame.groupby ` method is similar to base R ``aggregate ``
50
+ function.
51
+
52
+ .. ipython :: python
53
+
54
+ from pandas import DataFrame
55
+ df = DataFrame({
56
+ ' v1' : [1 ,3 ,5 ,7 ,8 ,3 ,5 ,np.nan,4 ,5 ,7 ,9 ],
57
+ ' v2' : [11 ,33 ,55 ,77 ,88 ,33 ,55 ,np.nan,44 ,55 ,77 ,99 ],
58
+ ' by1' : [" red" , " blue" , 1 , 2 , np.nan, " big" , 1 , 2 , " red" , 1 , np.nan, 12 ],
59
+ ' by2' : [" wet" , " dry" , 99 , 95 , np.nan, " damp" , 95 , 99 , " red" , 99 , np.nan,
60
+ np.nan]
61
+ })
62
+
63
+ g = df.groupby([' by1' ,' by2' ])
64
+ g[[' v1' ,' v2' ]].mean()
65
+
66
+ For more details and examples see :ref: `the groupby documentation
67
+ <groupby.split>`.
68
+
69
+ |tapply |_
70
+ ~~~~~~~~~
71
+
72
+ ``tapply `` is similar to ``aggregate ``, but data can be in a ragged array,
73
+ since the subclass sizes are possibly irregular. Using a data.frame called
74
+ ``baseball ``, and retrieving information based on the array ``team ``:
75
+
76
+ .. code-block :: r
77
+
78
+ baseball <-
79
+ data.frame(team = gl(5, 5,
80
+ labels = paste("Team", LETTERS[1:5])),
81
+ player = sample(letters, 25),
82
+ batting.average = runif(25, .200, .400))
83
+
84
+ tapply(baseball$batting.average, baseball.example$team,
85
+ max)
86
+
87
+ In ``pandas `` we may use :meth: `~pandas.pivot_table ` method to handle this:
88
+
89
+ .. ipython :: python
90
+
91
+ import random
92
+ import string
93
+
94
+ baseball = DataFrame({
95
+ ' team' : [" team %d " % (x+ 1 ) for x in range (5 )]* 5 ,
96
+ ' player' : random.sample(list (string.ascii_lowercase),25 ),
97
+ ' batting avg' : np.random.uniform(.200 , .400 , 25 )
98
+ })
99
+ baseball.pivot_table(values = ' batting avg' , cols = ' team' , aggfunc = np.max)
100
+
101
+ For more details and examples see :ref: `the reshaping documentation
102
+ <reshaping.pivot>`.
103
+
33
104
|subset |_
34
105
~~~~~~~~~~
35
106
@@ -51,9 +122,6 @@ index/slice as well as standard boolean indexing:
51
122
52
123
.. ipython :: python
53
124
54
- from pandas import DataFrame
55
- from numpy import random
56
-
57
125
df = DataFrame({' a' : random.randn(10 ), ' b' : random.randn(10 )})
58
126
df.query(' a <= b' )
59
127
df[df.a <= df.b]
@@ -120,8 +188,6 @@ table below shows how these data structures could be mapped in Python.
120
188
An expression using a data.frame called ``df `` in R where you want to
121
189
summarize ``x `` by ``month ``:
122
190
123
-
124
-
125
191
.. code-block :: r
126
192
127
193
require(plyr)
@@ -140,16 +206,14 @@ summarize ``x`` by ``month``:
140
206
In ``pandas `` the equivalent expression, using the
141
207
:meth: `~pandas.DataFrame.groupby ` method, would be:
142
208
143
-
144
-
145
209
.. ipython :: python
146
210
147
211
df = DataFrame({
148
- ' x' : random.uniform(1 ., 168 ., 120 ),
149
- ' y' : random.uniform(7 ., 334 ., 120 ),
150
- ' z' : random.uniform(1.7 , 20.7 , 120 ),
212
+ ' x' : np. random.uniform(1 ., 168 ., 120 ),
213
+ ' y' : np. random.uniform(7 ., 334 ., 120 ),
214
+ ' z' : np. random.uniform(1.7 , 20.7 , 120 ),
151
215
' month' : [5 ,6 ,7 ,8 ]* 30 ,
152
- ' week' : random.randint(1 ,4 , 120 )
216
+ ' week' : np. random.randint(1 ,4 , 120 )
153
217
})
154
218
155
219
grouped = df.groupby([' month' ,' week' ])
@@ -235,8 +299,8 @@ For more details and examples see :ref:`the reshaping documentation
235
299
|cast |_
236
300
~~~~~~~
237
301
238
- An expression using a data.frame called ``df `` in R to cast into a higher
239
- dimensional array:
302
+ In R `` acast `` is an expression using a data.frame called ``df `` in R to cast
303
+ into a higher dimensional array:
240
304
241
305
.. code-block :: r
242
306
@@ -256,18 +320,60 @@ In Python the best way is to make use of :meth:`~pandas.pivot_table`:
256
320
.. ipython :: python
257
321
258
322
df = DataFrame({
259
- ' x' : random.uniform(1 ., 168 ., 12 ),
260
- ' y' : random.uniform(7 ., 334 ., 12 ),
261
- ' z' : random.uniform(1.7 , 20.7 , 12 ),
323
+ ' x' : np. random.uniform(1 ., 168 ., 12 ),
324
+ ' y' : np. random.uniform(7 ., 334 ., 12 ),
325
+ ' z' : np. random.uniform(1.7 , 20.7 , 12 ),
262
326
' month' : [5 ,6 ,7 ]* 4 ,
263
327
' week' : [1 ,2 ]* 6
264
328
})
265
329
mdf = pd.melt(df, id_vars = [' month' , ' week' ])
266
330
pd.pivot_table(mdf, values = ' value' , rows = [' variable' ,' week' ],
267
331
cols = [' month' ], aggfunc = np.mean)
268
332
333
+ Similarly for ``dcast `` which uses a data.frame called ``df `` in R to
334
+ aggregate information based on ``Animal `` and ``FeedType ``:
335
+
336
+ .. code-block :: r
337
+
338
+ df <- data.frame(
339
+ Animal = c('Animal1', 'Animal2', 'Animal3', 'Animal2', 'Animal1',
340
+ 'Animal2', 'Animal3'),
341
+ FeedType = c('A', 'B', 'A', 'A', 'B', 'B', 'A'),
342
+ Amount = c(10, 7, 4, 2, 5, 6, 2)
343
+ )
344
+
345
+ dcast(df, Animal ~ FeedType, sum, fill=NaN)
346
+ # Alternative method using base R
347
+ with(df, tapply(Amount, list(Animal, FeedType), sum))
348
+
349
+ Python can approach this in two different ways. Firstly, similar to above
350
+ using :meth: `~pandas.pivot_table `:
351
+
352
+ .. ipython :: python
353
+
354
+ df = DataFrame({
355
+ ' Animal' : [' Animal1' , ' Animal2' , ' Animal3' , ' Animal2' , ' Animal1' ,
356
+ ' Animal2' , ' Animal3' ],
357
+ ' FeedType' : [' A' , ' B' , ' A' , ' A' , ' B' , ' B' , ' A' ],
358
+ ' Amount' : [10 , 7 , 4 , 2 , 5 , 6 , 2 ],
359
+ })
360
+
361
+ df.pivot_table(values = ' Amount' , rows = ' Animal' , cols = ' FeedType' , aggfunc = ' sum' )
362
+
363
+ The second approach is to use the :meth: `~pandas.DataFrame.groupby ` method:
364
+
365
+ .. ipython :: python
366
+
367
+ df.groupby([' Animal' ,' FeedType' ])[' Amount' ].sum()
368
+
269
369
For more details and examples see :ref: `the reshaping documentation
270
- <reshaping.pivot>`.
370
+ <reshaping.pivot>` or :ref: `the groupby documentation<groupby.split> `.
371
+
372
+ .. |aggregate | replace :: ``aggregate ``
373
+ .. _aggregate : http://finzi.psych.upenn.edu/R/library/stats/html/aggregate.html
374
+
375
+ .. |tapply | replace :: ``tapply ``
376
+ .. _tapply : http://finzi.psych.upenn.edu/R/library/base/html/tapply.html
271
377
272
378
.. |with | replace :: ``with ``
273
379
.. _with : http://finzi.psych.upenn.edu/R/library/base/html/with.html
0 commit comments