-
Notifications
You must be signed in to change notification settings - Fork 50
Emit custom character classes like an alternation #590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- static vars in payloads - Clean up _CharacterClassModel - Use the model for bytecodegen and consumer interface - Merge the grapheme and scalar match builtin cases together
- Removes the main consumer interface for ccc - Removes a lot of the consumer interface code required for ccc - Adds an optimization for collecting the ascii parts of a ccc - Use normal matching code in a CCC
/// We allow trivia into CustomCharacterClass, which could result in a CCC that matches nothing | ||
/// ie (?x)[ ] | ||
var guaranteesForwardProgress: Bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should reject (?x)[ ]
at parse time (we should reject a character class with no semantic members). That being said, it is possible to have a custom character class that matches nothing, e.g [a--a]
(though as I understand it, that would still guarantee forward progress).
This is based heavily off the work in swiftlang#590, rebased onto main, with some changes to remove even more consumer uses. Consumer functions only have two remaining uses: non-ASCII ranges and Unicode lookups (for things like general category, binary properties, name, etc.). This change primarily treats custom character classes as alternations around their contents, with set operations emitted as instructions instead of implemented via consumer function.
This is based heavily off the work in swiftlang#590, rebased onto main, with some changes to remove even more consumer uses. Consumer functions only have two remaining uses: non-ASCII ranges and Unicode lookups (for things like general category, binary properties, name, etc.). This change primarily treats custom character classes as alternations around their contents, with set operations emitted as instructions instead of implemented via consumer function.
This is based heavily off the work in swiftlang#590, rebased onto main, with some changes to remove even more consumer uses. Consumer functions only have two remaining uses: non-ASCII ranges and Unicode lookups (for things like general category, binary properties, name, etc.). This change primarily treats custom character classes as alternations around their contents, with set operations emitted as instructions instead of implemented via consumer function.
This is based heavily off the work in swiftlang#590, rebased onto main, with some changes to remove even more consumer uses. Consumer functions only have two remaining uses: non-ASCII ranges and Unicode lookups (for things like general category, binary properties, name, etc.). This change primarily treats custom character classes as alternations around their contents, with set operations emitted as instructions instead of implemented via consumer function.
This is based heavily off the work in swiftlang#590, rebased onto main, with some changes to remove even more consumer uses. Consumer functions only have two remaining uses: non-ASCII ranges and Unicode lookups (for things like general category, binary properties, name, etc.). This change primarily treats custom character classes as alternations around their contents, with set operations emitted as instructions instead of implemented via consumer function.
Superseded by #660. |
This is based heavily off the work in #590, rebased onto main, with some changes to remove even more consumer uses. Consumer functions only have two remaining uses: non-ASCII ranges and Unicode lookups (for things like general category, binary properties, name, etc.). This change primarily treats custom character classes as alternations around their contents, with set operations emitted as instructions instead of implemented via consumer function.
This is based heavily off the work in #590, rebased onto main, with some changes to remove even more consumer uses. Consumer functions only have two remaining uses: non-ASCII ranges and Unicode lookups (for things like general category, binary properties, name, etc.). This change primarily treats custom character classes as alternations around their contents, with set operations emitted as instructions instead of implemented via consumer function.
This is based heavily off the work in #590, rebased onto main, with some changes to remove even more consumer uses. Consumer functions only have two remaining uses: non-ASCII ranges and Unicode lookups (for things like general category, binary properties, name, etc.). This change primarily treats custom character classes as alternations around their contents, with set operations emitted as instructions instead of implemented via consumer function.
This is based heavily off the work in #590, rebased onto main, with some changes to remove even more consumer uses. Consumer functions only have two remaining uses: non-ASCII ranges and Unicode lookups (for things like general category, binary properties, name, etc.). This change primarily treats custom character classes as alternations around their contents, with set operations emitted as instructions instead of implemented via consumer function.
This is based heavily off the work in #590, rebased onto main, with some changes to remove even more consumer uses. Consumer functions only have two remaining uses: non-ASCII ranges and Unicode lookups (for things like general category, binary properties, name, etc.). This change primarily treats custom character classes as alternations around their contents, with set operations emitted as instructions instead of implemented via consumer function.
I thought this would be relatively easy to implement and I was curious what the performance benefits were so I implemented it quickly
A number of regressions in benchmarks without any code that hits any of the changes (most of these don't even have a custom character class). Most likely due to the instruction I added?
As expected
EmailRFCNoMatches
is much faster now that it's non-ascii CCC has it's ascii members collected into a bitsetThe big benefit is being able to remove all the code in
ConsumerInterface
that did matching on things we already have instructions for (characters, scalars, case insensitive, etc)Based on #547