Skip to content

Commit 62915b9

Browse files
committed
DOC: Complete R interface section
1 parent 3f24b87 commit 62915b9

File tree

1 file changed

+187
-8
lines changed

1 file changed

+187
-8
lines changed

doc/source/r_interface.rst

Lines changed: 187 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,11 @@
66
:suppress:
77
88
from pandas import *
9+
import numpy as np
10+
np.random.seed(123456)
11+
import matplotlib.pyplot as plt
12+
plt.close('all')
13+
options.display.mpl_style = 'default'
914
options.display.max_rows=15
1015
1116
@@ -71,24 +76,26 @@ Converting DataFrames into R objects
7176
.. versionadded:: 0.8
7277

7378
Starting from pandas 0.8, there is **experimental** support to convert
74-
DataFrames into the equivalent R object (that is, **data.frame**):
79+
``DataFrame`` into the equivalent R object (that is, **data.frame**) using ``convert_to_r_dataframe`` function:
7580

76-
.. ipython:: python
7781

78-
from pandas import DataFrame
82+
.. ipython:: python
7983
8084
df = DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C':[7,8,9]},
8185
index=["one", "two", "three"])
8286
r_dataframe = com.convert_to_r_dataframe(df)
8387
84-
print(type(r_dataframe))
88+
type(r_dataframe)
8589
print(r_dataframe)
8690
91+
print(r_dataframe.rownames)
92+
print(r_dataframe.colnames)
93+
8794
88-
The DataFrame's index is stored as the ``rownames`` attribute of the
89-
data.frame instance.
95+
The ``rpy2.robjects.vectors.DataFrame`` index is stored as the ``rownames``, and columns are stored as the
96+
``colnames`` attributes.
9097

91-
You can also use **convert_to_r_matrix** to obtain a ``Matrix`` instance, but
98+
You can also use ``convert_to_r_matrix`` to obtain a ``rpy2.robjects.vectors.Matrix`` instance, but
9299
bear in mind that it will only work with homogeneously-typed DataFrames (as
93100
R matrices bear no information on the data type):
94101

@@ -97,14 +104,186 @@ R matrices bear no information on the data type):
97104
98105
r_matrix = com.convert_to_r_matrix(df)
99106
100-
print(type(r_matrix))
107+
type(r_matrix)
101108
print(r_matrix)
102109
103110
104111
Calling R functions with pandas objects
105112
---------------------------------------
106113

114+
It is easier to use ``rpy2.robjects`` directly to call R functions.
115+
You can retrieve R object (including R function) from R namespace by dictionary access of ``robjects.r``.
116+
117+
Below example shows to retrieve R's **sum** function and pass ``rpy2.robjects.vector.DataFrame``.
118+
Note that the returned value from R **sum** is stored as ``robjects.vectors.Vectors`` type.
119+
Thus, specify index to get raw values.
120+
121+
See `RPy2 documentation <http://rpy.sourceforge.net/rpy2/doc-2.2/html/index.html>`__ for more.
122+
123+
124+
.. ipython:: python
125+
126+
import rpy2.robjects as robjects
127+
128+
rsum = robjects.r['sum']
129+
rsum_result = rsum(r_dataframe)
130+
131+
type(rsum_result)
132+
rsum_result[0]
133+
134+
135+
Preparing Data for R
136+
--------------------
137+
138+
Load Iris dataset and convert it to R **data.frame**.
139+
You can pass ``rpy2.robjects.vectors.DataFrame`` to R namespace using ``rpy2.robjects.r.assign``.
140+
In following examle, `r_iris` DataFrame can be refered as `iris` on R namespace.
141+
142+
143+
.. ipython:: python
144+
145+
iris = com.load_data('iris')
146+
iris.head()
147+
148+
r_iris = com.convert_to_r_dataframe(iris)
149+
robjects.r.assign('iris', r_iris);
150+
151+
152+
You can convert each data type using R functions if required.
153+
Function calling ``objects.r`` will execure a passed formula on R's namespace.
154+
For example, we can check the data type using R's **str** function,
155+
then convert "Species" column to categorical type (Factor) using R's **factor** function.
156+
157+
158+
.. ipython:: python
159+
160+
print(robjects.r('str(iris)'))
161+
162+
robjects.r('iris$Species <- factor(iris$Species)');
163+
print(robjects.r('str(iris)'))
107164
108165
109166
High-level interface to R estimators
110167
------------------------------------
168+
169+
Use "setosa" data in iris data set to perform Linear Regression.
170+
It is much easier to prepare and slice data on pandas side, then convert it to R **data.frame**.
171+
172+
173+
.. ipython:: python
174+
175+
setosa = iris[iris['Species'] == 'setosa']
176+
setosa.head()
177+
178+
r_setosa = com.convert_to_r_dataframe(setosa)
179+
robjects.r.assign('setosa', r_setosa);
180+
181+
182+
Once DataFrame is passed to R namespace, you can execute R formula to perform Liner Regression.
183+
184+
185+
.. ipython:: python
186+
187+
robjects.r('result <- lm(Sepal.Length~Sepal.Width, data=setosa)');
188+
print(robjects.r('summary(result)'))
189+
190+
191+
You can retrieve the result from R namespace to python namespace via ``rpy2.robjects.r``.
192+
If a returned value is R named list, you can check the list of keys via ``names`` attribute.
193+
To get raw values, access each element specifying index.
194+
195+
196+
.. ipython:: python
197+
198+
result = robjects.r['result']
199+
200+
print(result.names)
201+
print(result.rx('coefficients'))
202+
203+
intercept, coef1 = result.rx('coefficients')[0]
204+
intercept
205+
coef1
206+
207+
208+
``convert_robj`` function converts retrieved data to python friendly data type.
209+
In below example, retrieved R **data.frame** of fitted values and confidence interval will be
210+
converted to pandas ``DataFrame``.
211+
212+
213+
.. ipython:: python
214+
215+
robjects.r('predicted <- predict(result, setosa, interval="prediction")');
216+
print(robjects.r('head(predicted)'))
217+
218+
predicted = robjects.r['predicted']
219+
type(predicted)
220+
221+
predicted = com.convert_robj(predicted)
222+
type(predicted)
223+
predicted.head()
224+
225+
226+
Handling Time Series
227+
--------------------
228+
229+
Currently, there is no easy way to create R's built-in **ts** object from pandas time series.
230+
Also, ``Series`` cannot be converted using ``convert_to_r_dataframe`` function.
231+
Thus, you must create ``rpy2.robjects.vectors.Vector`` instance manually before calling ``robjects.r.assign``.
232+
233+
Use corresponding ``Vector`` class depending on the intended data type.
234+
See the rpy2 documentation `Vectors and arrays <http://rpy.sourceforge.net/rpy2/doc-2.2/html/vector.html>`__ for more.
235+
236+
Once the ``Vector`` is passed to R's namespace, call R's **ts** function to create **ts** object.
237+
238+
239+
.. ipython:: python
240+
241+
idx = date_range(start='2013-01-01', freq='M', periods=48)
242+
vts = Series(np.random.randn(48), index=idx).cumsum()
243+
vts
244+
245+
r_values = robjects.FloatVector(vts.values)
246+
robjects.r.assign('values', r_values);
247+
248+
robjects.r('vts <- ts(values, start=c(2013, 1, 1), frequency=12)');
249+
print(robjects.r['vts'])
250+
251+
252+
Below example performs Seasonal Decomposition using R's **stl** function, and get the result as `converted` ``DataFrame``.
253+
Because R's **ts** index cannot be retrieved by ``convert_robj``, assign ``DatetimeIndex`` manually after retrieval.
254+
255+
256+
.. ipython:: python
257+
258+
robjects.r('result <- stl(vts, s.window=12)');
259+
result = robjects.r['result']
260+
261+
print(result.names)
262+
263+
result_ts = result.rx('time.series')[0]
264+
converted = com.convert_robj(result_ts)
265+
converted.head()
266+
267+
converted.index = idx
268+
converted.head()
269+
270+
271+
Now you have pandas ``DataFrame``, you can perform further operation easily.
272+
273+
274+
.. ipython:: python
275+
276+
fig, axes = plt.subplots(4, 1)
277+
278+
axes[0].set_ylabel('Original');
279+
ax = vts.plot(ax=axes[0]);
280+
axes[1].set_ylabel('Trend');
281+
ax = converted['trend'].plot(ax=axes[1]);
282+
283+
axes[2].set_ylabel('Seasonal');
284+
ax = converted['seasonal'].plot(ax=axes[2]);
285+
286+
axes[3].set_ylabel('Residuals');
287+
@savefig rpy2_timeseries.png
288+
converted['remainder'].plot(ax=axes[3])
289+

0 commit comments

Comments
 (0)