Skip to content

DOC: Add %in% operator into compare w r (GH3850) #5875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 15, 2014
Merged

DOC: Add %in% operator into compare w r (GH3850) #5875

merged 4 commits into from
Jan 15, 2014

Conversation

NoRaincheck
Copy link
Contributor

Doesn't close #3850 but at least the %in% operator is now in the comparison with R docs. I've lumped it with the match function since thats the page you see the %in% operator in the R docs http://finzi.psych.upenn.edu/R/library/base/html/match.html.


.. ipython:: python

s = pd.Series(np.arange(5),index=np.arange(5)[::-1],dtype=np.float32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the np.arange(5)[::-1]? Doesn't that make just more complicated for the reader to understand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it would, I was thinking of something...strange.

On 8 January 2014 22:01, Joris Van den Bossche notifications@github.comwrote:

In doc/source/comparison_with_r.rst:

+~~~~~~~~~~~~
+
+A common way to select data in R is using %in% which is defined using the
+function match. The operator %in% is used to return a logical vector
+indicating if there is a match or not:
+
+.. code-block:: r
+

  • s <- 0:4
  • s %in% c(2,4)

+The :meth:~pandas.DataFrame.isin method is similar to R %in% operator:
+
+.. ipython:: python
+

  • s = pd.Series(np.arange(5),index=np.arange(5)[::-1],dtype=np.float32)

Why the np.arange(5)[::-1]? Doesn't that make just more complicated for
the reader to understand?


Reply to this email directly or view it on GitHubhttps://github.com//pull/5875/files#r8720691
.

Chapman

.. ipython:: python

s = pd.Series(np.arange(5),dtype=np.float32)
s.apply(lambda x: [2, 4].index(x) if x in [2,4] else np.nan)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is actually a pd.match(s, [2, 4]) function which does exactly this. Although, I don't know to which extent it should be advertised, as it is nowhere in the docs and is maybe also not in the best shape? @y-p @jreback ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. pd.match() ignores it's na_sentinal argument, but otherwise it's a 1:1 match.

The _hashtable_algo function it uses seems to be missing at least the int32 case, btw.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm...looks kind of like Index.get_indexer (and the non_unique version). Never even knew this existed. Maybe should open an issue to see use case / doc or deprecate? I don't see it being used anywhere internally

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chappers

replace this with

Series(pd.match(s,[2,4],np.nan)) (which now works), see #5943

jreback added a commit that referenced this pull request Jan 15, 2014
DOC: Add %in% operator into compare w r (GH3850)
@jreback jreback merged commit 14f4a78 into pandas-dev:master Jan 15, 2014
@jreback
Copy link
Contributor

jreback commented Jan 15, 2014

@chappers thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: make isin section in indexing.rst more prominent, maybe add to 10min.rst
3 participants