6
6
:suppress:
7
7
8
8
from pandas import *
9
+ import numpy as np
10
+ np.random.seed(123456 )
11
+ import matplotlib.pyplot as plt
12
+ plt.close(' all' )
13
+ options.display.mpl_style = ' default'
9
14
options.display.max_rows= 15
10
15
11
16
@@ -71,24 +76,26 @@ Converting DataFrames into R objects
71
76
.. versionadded :: 0.8
72
77
73
78
Starting from pandas 0.8, there is **experimental ** support to convert
74
- DataFrames into the equivalent R object (that is, **data.frame **):
79
+ `` DataFrame `` into the equivalent R object (that is, **data.frame **) using `` convert_to_r_dataframe `` function :
75
80
76
- .. ipython :: python
77
81
78
- from pandas import DataFrame
82
+ .. ipython :: python
79
83
80
84
df = DataFrame({' A' : [1 , 2 , 3 ], ' B' : [4 , 5 , 6 ], ' C' :[7 ,8 ,9 ]},
81
85
index = [" one" , " two" , " three" ])
82
86
r_dataframe = com.convert_to_r_dataframe(df)
83
87
84
- print ( type (r_dataframe) )
88
+ type (r_dataframe)
85
89
print (r_dataframe)
86
90
91
+ print (r_dataframe.rownames)
92
+ print (r_dataframe.colnames)
93
+
87
94
88
- The DataFrame's index is stored as the ``rownames `` attribute of the
89
- data.frame instance .
95
+ The `` rpy2.robjects.vectors. DataFrame`` index is stored as the ``rownames ``, and columns are stored as the
96
+ `` colnames `` attributes .
90
97
91
- You can also use ** convert_to_r_matrix ** to obtain a ``Matrix `` instance, but
98
+ You can also use `` convert_to_r_matrix `` to obtain a ``rpy2.robjects.vectors. Matrix `` instance, but
92
99
bear in mind that it will only work with homogeneously-typed DataFrames (as
93
100
R matrices bear no information on the data type):
94
101
@@ -97,14 +104,186 @@ R matrices bear no information on the data type):
97
104
98
105
r_matrix = com.convert_to_r_matrix(df)
99
106
100
- print ( type (r_matrix) )
107
+ type (r_matrix)
101
108
print (r_matrix)
102
109
103
110
104
111
Calling R functions with pandas objects
105
112
---------------------------------------
106
113
114
+ It is easier to use ``rpy2.robjects `` directly to call R functions.
115
+ You can retrieve R object (including R function) from R namespace by dictionary access of ``robjects.r ``.
116
+
117
+ Below example shows to retrieve R's **sum ** function and pass ``rpy2.robjects.vector.DataFrame ``.
118
+ Note that the returned value from R **sum ** is stored as ``robjects.vectors.Vectors `` type.
119
+ Thus, specify index to get raw values.
120
+
121
+ See `RPy2 documentation <http://rpy.sourceforge.net/rpy2/doc-2.2/html/index.html >`__ for more.
122
+
123
+
124
+ .. ipython :: python
125
+
126
+ import rpy2.robjects as robjects
127
+
128
+ rsum = robjects.r[' sum' ]
129
+ rsum_result = rsum(r_dataframe)
130
+
131
+ type (rsum_result)
132
+ rsum_result[0 ]
133
+
134
+
135
+ Preparing Data for R
136
+ --------------------
137
+
138
+ Load Iris dataset and convert it to R **data.frame **.
139
+ You can pass ``rpy2.robjects.vectors.DataFrame `` to R namespace using ``rpy2.robjects.r.assign ``.
140
+ In following examle, `r_iris ` DataFrame can be refered as `iris ` on R namespace.
141
+
142
+
143
+ .. ipython :: python
144
+
145
+ iris = com.load_data(' iris' )
146
+ iris.head()
147
+
148
+ r_iris = com.convert_to_r_dataframe(iris)
149
+ robjects.r.assign(' iris' , r_iris);
150
+
151
+
152
+ You can convert each data type using R functions if required.
153
+ Function calling ``objects.r `` will execure a passed formula on R's namespace.
154
+ For example, we can check the data type using R's **str ** function,
155
+ then convert "Species" column to categorical type (Factor) using R's **factor ** function.
156
+
157
+
158
+ .. ipython :: python
159
+
160
+ print (robjects.r(' str(iris)' ))
161
+
162
+ robjects.r(' iris$Species <- factor(iris$Species)' );
163
+ print (robjects.r(' str(iris)' ))
107
164
108
165
109
166
High-level interface to R estimators
110
167
------------------------------------
168
+
169
+ Use "setosa" data in iris data set to perform Linear Regression.
170
+ It is much easier to prepare and slice data on pandas side, then convert it to R **data.frame **.
171
+
172
+
173
+ .. ipython :: python
174
+
175
+ setosa = iris[iris[' Species' ] == ' setosa' ]
176
+ setosa.head()
177
+
178
+ r_setosa = com.convert_to_r_dataframe(setosa)
179
+ robjects.r.assign(' setosa' , r_setosa);
180
+
181
+
182
+ Once DataFrame is passed to R namespace, you can execute R formula to perform Liner Regression.
183
+
184
+
185
+ .. ipython :: python
186
+
187
+ robjects.r(' result <- lm(Sepal.Length~Sepal.Width, data=setosa)' );
188
+ print (robjects.r(' summary(result)' ))
189
+
190
+
191
+ You can retrieve the result from R namespace to python namespace via ``rpy2.robjects.r ``.
192
+ If a returned value is R named list, you can check the list of keys via ``names `` attribute.
193
+ To get raw values, access each element specifying index.
194
+
195
+
196
+ .. ipython :: python
197
+
198
+ result = robjects.r[' result' ]
199
+
200
+ print (result.names)
201
+ print (result.rx(' coefficients' ))
202
+
203
+ intercept, coef1 = result.rx(' coefficients' )[0 ]
204
+ intercept
205
+ coef1
206
+
207
+
208
+ ``convert_robj `` function converts retrieved data to python friendly data type.
209
+ In below example, retrieved R **data.frame ** of fitted values and confidence interval will be
210
+ converted to pandas ``DataFrame ``.
211
+
212
+
213
+ .. ipython :: python
214
+
215
+ robjects.r(' predicted <- predict(result, setosa, interval="prediction")' );
216
+ print (robjects.r(' head(predicted)' ))
217
+
218
+ predicted = robjects.r[' predicted' ]
219
+ type (predicted)
220
+
221
+ predicted = com.convert_robj(predicted)
222
+ type (predicted)
223
+ predicted.head()
224
+
225
+
226
+ Handling Time Series
227
+ --------------------
228
+
229
+ Currently, there is no easy way to create R's built-in **ts ** object from pandas time series.
230
+ Also, ``Series `` cannot be converted using ``convert_to_r_dataframe `` function.
231
+ Thus, you must create ``rpy2.robjects.vectors.Vector `` instance manually before calling ``robjects.r.assign ``.
232
+
233
+ Use corresponding ``Vector `` class depending on the intended data type.
234
+ See the rpy2 documentation `Vectors and arrays <http://rpy.sourceforge.net/rpy2/doc-2.2/html/vector.html >`__ for more.
235
+
236
+ Once the ``Vector `` is passed to R's namespace, call R's **ts ** function to create **ts ** object.
237
+
238
+
239
+ .. ipython :: python
240
+
241
+ idx = date_range(start = ' 2013-01-01' , freq = ' M' , periods = 48 )
242
+ vts = Series(np.random.randn(48 ), index = idx).cumsum()
243
+ vts
244
+
245
+ r_values = robjects.FloatVector(vts.values)
246
+ robjects.r.assign(' values' , r_values);
247
+
248
+ robjects.r(' vts <- ts(values, start=c(2013, 1, 1), frequency=12)' );
249
+ print (robjects.r[' vts' ])
250
+
251
+
252
+ Below example performs Seasonal Decomposition using R's **stl ** function, and get the result as `converted ` ``DataFrame ``.
253
+ Because R's **ts ** index cannot be retrieved by ``convert_robj ``, assign ``DatetimeIndex `` manually after retrieval.
254
+
255
+
256
+ .. ipython :: python
257
+
258
+ robjects.r(' result <- stl(vts, s.window=12)' );
259
+ result = robjects.r[' result' ]
260
+
261
+ print (result.names)
262
+
263
+ result_ts = result.rx(' time.series' )[0 ]
264
+ converted = com.convert_robj(result_ts)
265
+ converted.head()
266
+
267
+ converted.index = idx
268
+ converted.head()
269
+
270
+
271
+ Now you have pandas ``DataFrame ``, you can perform further operation easily.
272
+
273
+
274
+ .. ipython :: python
275
+
276
+ fig, axes = plt.subplots(4 , 1 )
277
+
278
+ axes[0 ].set_ylabel(' Original' );
279
+ ax = vts.plot(ax = axes[0 ]);
280
+ axes[1 ].set_ylabel(' Trend' );
281
+ ax = converted[' trend' ].plot(ax = axes[1 ]);
282
+
283
+ axes[2 ].set_ylabel(' Seasonal' );
284
+ ax = converted[' seasonal' ].plot(ax = axes[2 ]);
285
+
286
+ axes[3 ].set_ylabel(' Residuals' );
287
+ @savefig rpy2_timeseries.png
288
+ converted[' remainder' ].plot(ax = axes[3 ])
289
+
0 commit comments