Skip to content

Commit c13d54f

Browse files
committed
Implement PCRE2_EXTRA_CASELESS_RESTRICT and related features
1 parent fcceddc commit c13d54f

File tree

14 files changed

+764
-117
lines changed

14 files changed

+764
-117
lines changed

ChangeLog

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,13 @@ configure.ac and CMakeLists.txt.
3636
8. Fixed a bug in pcre2test when a ridiculously large string repeat required a
3737
stupid amount of memory. It now gives a clean realloc() failure error.
3838

39+
9. Updates to restrict the interaction between ASCII and non-ASCII characters
40+
for caseless matching and items like \d:
41+
42+
(a) Added PCRE2_EXTRA_CASELESS_RESTRICT to lock out mixing of ASCII and
43+
non-ASCII when matching caselessly. This is also /r in pcre2test and
44+
(?r) within patterns.
45+
3946

4047
Version 10.42 11-December-2022
4148
------------------------------

HACKING

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Technical Notes about PCRE2
1+
Technical notes about PCRE2
22
---------------------------
33

44
These are very rough technical notes that record potentially useful information
@@ -248,7 +248,6 @@ by a length and an offset into the pattern to specify the name.
248248
The following have one data item that follows in the next vector element:
249249

250250
META_BIGVALUE Next is a literal >= META_END
251-
META_OPTIONS (?i) and friends (data is new option bits)
252251
META_POSIX POSIX class item (data identifies the class)
253252
META_POSIX_NEG negative POSIX class item (ditto)
254253

@@ -298,6 +297,11 @@ META_MINMAX {n,m} repeat
298297
META_MINMAX_PLUS {n,m}+ repeat
299298
META_MINMAX_QUERY {n,m}? repeat
300299

300+
This one is followed by two elements, giving the new option settings for the
301+
main and extra options, respectively.
302+
303+
META_OPTIONS (?i) and friends
304+
301305
This one is followed by three elements. The first is 0 for '>' and 1 for '>=';
302306
the next two are the major and minor numbers:
303307

@@ -827,4 +831,4 @@ not a real opcode, but is used to check at compile time that tables indexed by
827831
opcode are the correct length, in order to catch updating errors.
828832

829833
Philip Hazel
830-
April 2022
834+
January 2023

maint/GenerateUcd.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,8 @@
109109
# 10-January-2022: Addition of general Boolean property support
110110
# 12-January-2022: Merge scriptx and bidiclass fields
111111
# 14-January-2022: Enlarge Boolean property offset to 12 bits
112+
# 28-January-2023: Remove ASCII "other case" from non-ASCII character that
113+
# are present in caseless sets.
112114
#
113115
# ----------------------------------------------------------------------------
114116
#
@@ -710,6 +712,16 @@ def write_bitsets(list, item_size):
710712

711713
# End of block of code for creating offsets for caseless matching sets.
712714

715+
# Scan the caseless sets, and for any non-ASCII character that has an ASCII
716+
# character as its "base" other case, remove the other case. This makes it
717+
# easier to handle those characters when the PCRE2 option for not mixing ASCII
718+
# and non-ASCII is enabled. In principle one should perhaps scan for a
719+
# non-ASCII alternative, but in practice these don't exist.
720+
721+
for s in caseless_sets:
722+
for x in s:
723+
if x > 127 and x + other_case[x] < 128:
724+
other_case[x] = 0
713725

714726
# Combine all the tables
715727

maint/ucptest.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -471,7 +471,7 @@ switch(bidi)
471471
printf("U+%04X %s %s: %s, %s, %s", c, bidiclass, typename, fulltypename,
472472
scriptname, graphbreak);
473473

474-
if (is_just_one && othercase != c)
474+
if (is_just_one && (othercase != c || caseset != 0))
475475
{
476476
printf(", U+%04X", othercase);
477477
if (caseset != 0)

src/pcre2.h.in

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
/* This is the public header file for the PCRE library, second API, to be
66
#included by applications that call PCRE2 functions.
77
8-
Copyright (c) 2016-2021 University of Cambridge
8+
Copyright (c) 2016-2023 University of Cambridge
99
1010
-----------------------------------------------------------------------------
1111
Redistribution and use in source and binary forms, with or without
@@ -153,6 +153,7 @@ D is inspected during pcre2_dfa_match() execution
153153
#define PCRE2_EXTRA_ESCAPED_CR_IS_LF 0x00000010u /* C */
154154
#define PCRE2_EXTRA_ALT_BSUX 0x00000020u /* C */
155155
#define PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK 0x00000040u /* C */
156+
#define PCRE2_EXTRA_CASELESS_RESTRICT 0x00000080u /* C */
156157

157158
/* These are for pcre2_jit_compile(). */
158159

0 commit comments

Comments
 (0)