From f00bd0d4978e81063dc54b84db606162afbd9f11 Mon Sep 17 00:00:00 2001 From: Cheryl Sabella Date: Sun, 21 Jan 2018 17:48:30 -0500 Subject: [PATCH 1/4] bpo-32614: Modify re examples to use a raw string to prevent warning --- Doc/howto/regex.rst | 10 +++++----- Doc/howto/unicode.rst | 2 +- Doc/library/re.rst | 4 ++-- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/Doc/howto/regex.rst b/Doc/howto/regex.rst index 87a6b1aba59f9f..7e9bce031cdae0 100644 --- a/Doc/howto/regex.rst +++ b/Doc/howto/regex.rst @@ -457,7 +457,7 @@ In actual programs, the most common style is to store the Two pattern methods return all of the matches for a pattern. :meth:`~re.Pattern.findall` returns a list of matching strings:: - >>> p = re.compile('\d+') + >>> p = re.compile(r'\d+') >>> p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping') ['12', '11', '10'] @@ -1096,11 +1096,11 @@ following calls:: The module-level function :func:`re.split` adds the RE to be used as the first argument, but is otherwise the same. :: - >>> re.split('[\W]+', 'Words, words, words.') + >>> re.split(r'[\W]+', 'Words, words, words.') ['Words', 'words', 'words', ''] - >>> re.split('([\W]+)', 'Words, words, words.') + >>> re.split(r'([\W]+)', 'Words, words, words.') ['Words', ', ', 'words', ', ', 'words', '.', ''] - >>> re.split('[\W]+', 'Words, words, words.', 1) + >>> re.split(r'[\W]+', 'Words, words, words.', 1) ['Words', 'words, words.'] @@ -1233,7 +1233,7 @@ either side. This takes the job beyond :meth:`!replace`'s abilities.) Another common task is deleting every occurrence of a single character from a string or replacing it with another single character. You might do this with -something like ``re.sub('\n', ' ', S)``, but :meth:`~str.translate` is capable of +something like ``re.sub(r'\n', ' ', S)``, but :meth:`~str.translate` is capable of doing both tasks and will be faster than any regular expression operation can be. diff --git a/Doc/howto/unicode.rst b/Doc/howto/unicode.rst index d4b8f8d2204ab7..093f4454af1d43 100644 --- a/Doc/howto/unicode.rst +++ b/Doc/howto/unicode.rst @@ -463,7 +463,7 @@ The string in this example has the number 57 written in both Thai and Arabic numerals:: import re - p = re.compile('\d+') + p = re.compile(r'\d+') s = "Over \u0e55\u0e57 57 flavours" m = p.search(s) diff --git a/Doc/library/re.rst b/Doc/library/re.rst index 9b175f4e96756b..3d249de140b64f 100644 --- a/Doc/library/re.rst +++ b/Doc/library/re.rst @@ -345,7 +345,7 @@ The special characters are: This example looks for a word following a hyphen: - >>> m = re.search('(?<=-)\w+', 'spam-egg') + >>> m = re.search(r'(?<=-)\w+', 'spam-egg') >>> m.group(0) 'egg' @@ -1398,7 +1398,7 @@ into a list with each nonempty line having its own entry: .. doctest:: :options: +NORMALIZE_WHITESPACE - >>> entries = re.split("\n+", text) + >>> entries = re.split(r"\n+", text) >>> entries ['Ross McFluff: 834.345.1254 155 Elm Street', 'Ronald Heathmore: 892.345.3428 436 Finley Avenue', From cf437f8fa5636c530ee6d2cef2a0b2283c5cf23f Mon Sep 17 00:00:00 2001 From: Cheryl Sabella Date: Sat, 27 Jan 2018 18:28:33 -0500 Subject: [PATCH 2/4] Remove raw strings from \n and clarify usage in HOWTO --- Doc/howto/regex.rst | 16 +++++++++++++++- Doc/library/re.rst | 2 +- 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/Doc/howto/regex.rst b/Doc/howto/regex.rst index 7e9bce031cdae0..d227838be6b369 100644 --- a/Doc/howto/regex.rst +++ b/Doc/howto/regex.rst @@ -289,6 +289,8 @@ Putting REs in strings keeps the Python language simpler, but has one disadvantage which is the topic of the next section. +.. _the-backslash-plague: + The Backslash Plague -------------------- @@ -327,6 +329,12 @@ backslashes are not handled in any special way in a string literal prefixed with while ``"\n"`` is a one-character string containing a newline. Regular expressions will often be written in Python code using this raw string notation. +In addition, special escape sequences that are valid in ``re``, but not valid +as Python string literals, now result in a :exc:`DeprecationWarning` and will +eventually become a :exc:`SyntaxError`, which means the sequences will be +invalid if raw string notation or escaping the backslashes isn't used. + + +-------------------+------------------+ | Regular String | Raw string | +===================+==================+ @@ -461,6 +469,12 @@ Two pattern methods return all of the matches for a pattern. >>> p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping') ['12', '11', '10'] +The `r` prefix, making the literal a raw string literal, is needed in this +example because escape sequences in a normal "cooked" string literal that are +not recognized by Python, as opposed to ``re``, now result in a +:exc:`DeprecationWarning` and will eventually become a :exc:`SyntaxError`. See +:ref:`the-backslash-plague`. + :meth:`~re.Pattern.findall` has to create the entire list before it can be returned as the result. The :meth:`~re.Pattern.finditer` method returns a sequence of :ref:`match object ` instances as an :term:`iterator`:: @@ -1233,7 +1247,7 @@ either side. This takes the job beyond :meth:`!replace`'s abilities.) Another common task is deleting every occurrence of a single character from a string or replacing it with another single character. You might do this with -something like ``re.sub(r'\n', ' ', S)``, but :meth:`~str.translate` is capable of +something like ``re.sub('\n', ' ', S)``, but :meth:`~str.translate` is capable of doing both tasks and will be faster than any regular expression operation can be. diff --git a/Doc/library/re.rst b/Doc/library/re.rst index 3d249de140b64f..83ebe7db01ad96 100644 --- a/Doc/library/re.rst +++ b/Doc/library/re.rst @@ -1398,7 +1398,7 @@ into a list with each nonempty line having its own entry: .. doctest:: :options: +NORMALIZE_WHITESPACE - >>> entries = re.split(r"\n+", text) + >>> entries = re.split("\n+", text) >>> entries ['Ross McFluff: 834.345.1254 155 Elm Street', 'Ronald Heathmore: 892.345.3428 436 Finley Avenue', From 0fccac3c02a787178da6262ed07a1f1897bae67d Mon Sep 17 00:00:00 2001 From: Cheryl Sabella Date: Fri, 2 Feb 2018 07:43:20 -0500 Subject: [PATCH 3/4] Add blurb and correct markup --- Doc/howto/regex.rst | 13 +++++++------ .../2018-02-02-07-41-57.bpo-32614.LSqzGw.rst | 3 +++ 2 files changed, 10 insertions(+), 6 deletions(-) create mode 100644 Misc/NEWS.d/next/Documentation/2018-02-02-07-41-57.bpo-32614.LSqzGw.rst diff --git a/Doc/howto/regex.rst b/Doc/howto/regex.rst index d227838be6b369..bdf687ee455140 100644 --- a/Doc/howto/regex.rst +++ b/Doc/howto/regex.rst @@ -329,10 +329,11 @@ backslashes are not handled in any special way in a string literal prefixed with while ``"\n"`` is a one-character string containing a newline. Regular expressions will often be written in Python code using this raw string notation. -In addition, special escape sequences that are valid in ``re``, but not valid -as Python string literals, now result in a :exc:`DeprecationWarning` and will -eventually become a :exc:`SyntaxError`, which means the sequences will be -invalid if raw string notation or escaping the backslashes isn't used. +In addition, special escape sequences that are valid in regular expressions, +but not valid as Python string literals, now result in a +:exc:`DeprecationWarning` and will eventually become a :exc:`SyntaxError`, +which means the sequences will be invalid if raw string notation or escaping +the backslashes isn't used. +-------------------+------------------+ @@ -469,9 +470,9 @@ Two pattern methods return all of the matches for a pattern. >>> p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping') ['12', '11', '10'] -The `r` prefix, making the literal a raw string literal, is needed in this +The ``r`` prefix, making the literal a raw string literal, is needed in this example because escape sequences in a normal "cooked" string literal that are -not recognized by Python, as opposed to ``re``, now result in a +not recognized by Python, as opposed to regular expressions, now result in a :exc:`DeprecationWarning` and will eventually become a :exc:`SyntaxError`. See :ref:`the-backslash-plague`. diff --git a/Misc/NEWS.d/next/Documentation/2018-02-02-07-41-57.bpo-32614.LSqzGw.rst b/Misc/NEWS.d/next/Documentation/2018-02-02-07-41-57.bpo-32614.LSqzGw.rst new file mode 100644 index 00000000000000..cc594390bb86ec --- /dev/null +++ b/Misc/NEWS.d/next/Documentation/2018-02-02-07-41-57.bpo-32614.LSqzGw.rst @@ -0,0 +1,3 @@ +Modify RE examples in documentation to use raw strings to prevent +:exc:``DeprecationWarning`` and add text to REGEX HOWTO to highlight the +deprecation. From 8c77a8b70c561eeb205ab4773c7df9e2f536c173 Mon Sep 17 00:00:00 2001 From: Cheryl Sabella Date: Fri, 2 Feb 2018 07:47:03 -0500 Subject: [PATCH 4/4] Update 2018-02-02-07-41-57.bpo-32614.LSqzGw.rst --- .../next/Documentation/2018-02-02-07-41-57.bpo-32614.LSqzGw.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Misc/NEWS.d/next/Documentation/2018-02-02-07-41-57.bpo-32614.LSqzGw.rst b/Misc/NEWS.d/next/Documentation/2018-02-02-07-41-57.bpo-32614.LSqzGw.rst index cc594390bb86ec..9e9f3e3a74df3b 100644 --- a/Misc/NEWS.d/next/Documentation/2018-02-02-07-41-57.bpo-32614.LSqzGw.rst +++ b/Misc/NEWS.d/next/Documentation/2018-02-02-07-41-57.bpo-32614.LSqzGw.rst @@ -1,3 +1,3 @@ Modify RE examples in documentation to use raw strings to prevent -:exc:``DeprecationWarning`` and add text to REGEX HOWTO to highlight the +:exc:`DeprecationWarning` and add text to REGEX HOWTO to highlight the deprecation.