From 4c76f89dfc53a253213d409f474aa94a5ebd4d82 Mon Sep 17 00:00:00 2001 From: chapman siu Date: Wed, 8 Jan 2014 20:13:02 +1100 Subject: [PATCH 1/4] DOC: r match function --- doc/source/comparison_with_r.rst | 41 ++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/doc/source/comparison_with_r.rst b/doc/source/comparison_with_r.rst index 0738930b4ea25..1acdda89d6150 100644 --- a/doc/source/comparison_with_r.rst +++ b/doc/source/comparison_with_r.rst @@ -66,6 +66,44 @@ function. For more details and examples see :ref:`the groupby documentation `. +|match|_ +~~~~~~~~~~~~ + +A common way to select data in R is using ``%in%`` which is defined using the +function ``match``. The operator ``%in%`` is used to return a logical vector +indicating if there is a match or not: + +.. code-block:: r + + s <- 0:4 + s %in% c(2,4) + +The :meth:`~pandas.DataFrame.isin` method is similar to R ``%in%`` operator: + +.. ipython:: python + + s = pd.Series(np.arange(5),index=np.arange(5)[::-1],dtype=np.float32) + s.isin([2, 4]) + +The ``match`` function returns a vector of the positions of matches +of its first argument in its second: + +.. code-block:: r + + s <- 0:4 + match(s, c(2,4)) + +The :meth:`~pandas.core.groupby.GroupBy.apply` method can be used to replicate +this: + +.. ipython:: python + + s = pd.Series(np.arange(5),index=np.arange(5)[::-1],dtype=np.float32) + s.apply(lambda x: [2, 4].index(x) if x in [2,4] else np.nan) + +For more details and examples see :ref:`the reshaping documentation +`. + |tapply|_ ~~~~~~~~~ @@ -372,6 +410,9 @@ For more details and examples see :ref:`the reshaping documentation .. |aggregate| replace:: ``aggregate`` .. _aggregate: http://finzi.psych.upenn.edu/R/library/stats/html/aggregate.html +.. |match| replace:: ``match`` / ``%in%`` +.. _match: http://finzi.psych.upenn.edu/R/library/base/html/match.html + .. |tapply| replace:: ``tapply`` .. _tapply: http://finzi.psych.upenn.edu/R/library/base/html/tapply.html From 3bc77624c1c58dfb39c1571f92ffb7ce23f13c70 Mon Sep 17 00:00:00 2001 From: Chapman Siu Date: Wed, 8 Jan 2014 22:15:17 +1100 Subject: [PATCH 2/4] Update comparison_with_r.rst rm index --- doc/source/comparison_with_r.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/comparison_with_r.rst b/doc/source/comparison_with_r.rst index 1acdda89d6150..5eb557a325374 100644 --- a/doc/source/comparison_with_r.rst +++ b/doc/source/comparison_with_r.rst @@ -82,7 +82,7 @@ The :meth:`~pandas.DataFrame.isin` method is similar to R ``%in%`` operator: .. ipython:: python - s = pd.Series(np.arange(5),index=np.arange(5)[::-1],dtype=np.float32) + s = pd.Series(np.arange(5),dtype=np.float32) s.isin([2, 4]) The ``match`` function returns a vector of the positions of matches From 7fc2a211eaaa8f532b95a925f69703a6e55f4688 Mon Sep 17 00:00:00 2001 From: Chapman Siu Date: Wed, 8 Jan 2014 22:16:59 +1100 Subject: [PATCH 3/4] Update comparison_with_r.rst --- doc/source/comparison_with_r.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/comparison_with_r.rst b/doc/source/comparison_with_r.rst index 5eb557a325374..3cee4608b168f 100644 --- a/doc/source/comparison_with_r.rst +++ b/doc/source/comparison_with_r.rst @@ -98,7 +98,7 @@ this: .. ipython:: python - s = pd.Series(np.arange(5),index=np.arange(5)[::-1],dtype=np.float32) + s = pd.Series(np.arange(5),dtype=np.float32) s.apply(lambda x: [2, 4].index(x) if x in [2,4] else np.nan) For more details and examples see :ref:`the reshaping documentation From 5c35dec026510cebdf9d19bb6c9ba8c7c0291042 Mon Sep 17 00:00:00 2001 From: Chapman Siu Date: Wed, 15 Jan 2014 20:13:11 +1100 Subject: [PATCH 4/4] Update comparison_with_r.rst --- doc/source/comparison_with_r.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/comparison_with_r.rst b/doc/source/comparison_with_r.rst index 3cee4608b168f..a5b8fabac11ac 100644 --- a/doc/source/comparison_with_r.rst +++ b/doc/source/comparison_with_r.rst @@ -99,7 +99,7 @@ this: .. ipython:: python s = pd.Series(np.arange(5),dtype=np.float32) - s.apply(lambda x: [2, 4].index(x) if x in [2,4] else np.nan) + Series(pd.match(s,[2,4],np.nan)) For more details and examples see :ref:`the reshaping documentation `.