Skip to content

regex=False as default behavior, and expand regex disabling options #7563

Closed
@davclark

Description

@davclark

I was very happy to see pr #5879, allowing disabling the auto-regex feature for functions in the .str namespace. Doing something about this has been on my todo list for a while. Automatic promotion of strings to regular expressions is one of the two big points of difficulty for beginners using pandas (in my experience anyway). The most obvious problem comes up when you have literal dollar signs you wish to operate on.

I'd like to argue that the faster, less confusing regex=False be the default.

This is a trivial thing to do, and if folks are willing to accept it, I'm happy to submit a pull request.

I also ran into this problem when using unicode delimiters with the read_* family of functions:

ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators; you can avoid this warning by specifying engine='python'.

This is a another case where we're losing performance and increasing confusion. So, I'd like to expand this option elsewhere, and could submit this all as one pull request, or two (if it seems reasonable).

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions