Skip to content

Command line interface for the re module #108095

Open
@serhiy-storchaka

Description

@serhiy-storchaka

Feature or enhancement

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

https://discuss.python.org/t/command-line-interface-for-the-re-module/31819

Proposal:

I propose to add module re.grep which provides a CLI for the re module. It mostly emulate the GNU grep utility.

usage: grep.py [--help] [-e PATTERN] [-f PATTERN_FILE] [-F] [-i]
               [--no-ignore-case] [-v] [-w] [-x] [-c] [-L] [-l] [-o] [-q] [-H]
               [-h] [-n] [-A NUM] [-B NUM] [-C NUM]
               ...

positional arguments:
  FILES                 Files to search.

options:
  --help                show this help message and exit

Matching Control:
  -e PATTERN, --regexp PATTERN
                        Use PATTERN as the pattern.
  -f PATTERN_FILE, --file PATTERN_FILE
                        Obtain patterns from PATTERN_FILE, one per line.
  -F, --fixed-strings   Interpret patterns as fixed strings.
  -i, --ignore-case     Ignore case distinctions in patterns and input data.
  --no-ignore-case      Do not ignore case distinctions in patterns and input
                        data. This is the default.
  -v, --invert-match    Invert the sense of matching, to select non-matching
                        lines.
  -w, --word-regexp     Select only those lines containing matches that form
                        whole words.
  -x, --line-regexp     Select only those matches that exactly match the whole
                        line.

Output Control:
  -c, --count           Suppress normal output; instead print a count of
                        matching lines for each input file.
  -L, --files-without-match
                        Suppress normal output; instead print the name of each
                        input file from which no output would normally have
                        been printed.
  -l, --files-with-match
                        Suppress normal output; instead print the name of each
                        input file from which output would normally have been
                        printed.
  -o, --only-matching   Print only the matched (non-empty) parts of a matching
                        line, with each such part on a separate output line.
  -q, --quiet           Quiet; do not write anything to standard output. Exit
                        immediately with zero status if any match is found.

Output Line Prefix Control:
  -H, --with-filename   Print the file name for each match. This is the
                        default when there is more than one file to search.
  -h, --no-filename     Suppress the prefixing of file names on output. This
                        is the default when there is only one file (or only
                        standard input) to search.
  -n, --line-number     Prefix each line of output with the 1-based line
                        number within its input file.

Context Line Control:
  -A NUM, --after-context NUM
                        Print NUM lines of trailing context after matching
                        lines.
  -B NUM, --before-context NUM
                        Print NUM lines of leading context before matching
                        lines.
  -C NUM, --context NUM
                        Print NUM lines of output context.

Main differences with grep:

  • Obviously, only Python regular expressions are supported, no other syntax.
  • -e cannot be omitted if pattern is specified in command line. It is for simplicity.
  • -e specifies a single pattern, not a newline separated sequence of patterns as grep. It would be very easy to implement the latter behavior, but why bother? You can specify -e multiple times. Also, you can use verbose mode (?x) in scripts.
  • -o outputs the match of the first matching pattern, not the longest match if multiple patterns are specified as in grep. It could be easy to implement the latter behavior while it is a simple Python implementation in this file, but I want to add support for union of patterns in re, and it will have the former behavior.
  • No recursive searching in directories with filename filtering yet. First I need to implement some features in fnmatch and grep to not repeat the code here.
  • No color output. Maybe in distant future, when we add simple interface for color terminal output.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions