Skip to content

feat: Support escaped string literals (PostgreSQL) #502

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 25, 2022

Conversation

ovr
Copy link
Contributor

@ovr ovr commented May 18, 2022

Hello!

It's a draft which implements special PostgreSQL escaped string syntax.

https://www.postgresql.org/docs/8.3/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS

PostgreSQL also accepts "escape" string constants, which are an extension to the SQL standard. An escape string constant is specified by writing the letter E (upper or lower case) just before the opening single quote, e.g. E'foo'. (When continuing an escape string constant across lines, write E only before the first opening quote.) Within an escape string, a backslash character () begins a C-like backslash escape sequence, in which the combination of backslash and following character(s) represents a special byte value. \b is a backspace, \f is a form feed, \n is a newline, \r is a carriage return, \t is a tab.

image

Thanks

Signed-off-by: Dmitry Patsura <talk@dmtry.me>
@ovr
Copy link
Contributor Author

ovr commented May 18, 2022

@alamb can you take a look to verify that it's a correct approach. Thanks

@coveralls
Copy link

coveralls commented May 18, 2022

Pull Request Test Coverage Report for Build 2378670235

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 65 of 92 (70.65%) changed or added relevant lines in 4 files are covered.
  • 553 unchanged lines in 6 files lost coverage.
  • Overall coverage decreased (-0.9%) to 89.635%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/parser.rs 3 5 60.0%
src/ast/value.rs 11 19 57.89%
src/tokenizer.rs 29 46 63.04%
Files with Coverage Reduction New Missed Lines %
tests/sqlparser_redshift.rs 1 98.11%
tests/sqlparser_snowflake.rs 2 96.43%
tests/sqlparser_postgres.rs 16 97.83%
tests/sqlparser_common.rs 69 97.01%
src/ast/mod.rs 156 78.21%
src/parser.rs 309 82.96%
Totals Coverage Status
Change from base Build 2328175823: -0.9%
Covered Lines: 8388
Relevant Lines: 9358

💛 - Coveralls

@alamb
Copy link
Contributor

alamb commented May 19, 2022 via email

@ovr ovr marked this pull request as ready for review May 20, 2022 14:33
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ovr -- I like this PR; very nice.

The only comment I think needs to be addressed prior to merge is the comment on EscapeEscapedStringLiteral

src/ast/value.rs Outdated

impl<'a> fmt::Display for EscapeEscapedStringLiteral<'a> {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
let mut is_escaped = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see is_escaped ever set to false -- I would expect it would start with is_escaped = true and then is_escaped wold be set to false after each character was written

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this work with: First \n second \\ third \n fourth \

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't see is_escaped ever set to false -- I would expect it would start with is_escaped = true and then is_escaped wold be set to false after each character was written

I will remove it, It's useles.

It's not a correct value, because the last \ should be escaped.
image

src/parser.rs Outdated
@@ -496,6 +496,10 @@ impl<'a> Parser<'a> {
expr: Box::new(self.parse_subexpr(Self::PLUS_MINUS_PREC)?),
})
}
Token::EscapedStringLiteral(_) if dialect_of!(self is PostgreSqlDialect) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 for conditionalizing on postgres dialect

let mut s = String::new();
chars.next(); // consume the opening quote

// slash escaping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYIW this example from stack overflow looks like it might be a nice way to avoid macro overhead (and thus code bloat): https://stackoverflow.com/questions/58551211/how-do-i-interpret-escaped-characters-in-a-string

Copy link
Contributor Author

@ovr ovr May 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the same because this function tries to find & escapes the string from the query. It tries to find a single quote that can be escaped or not escaped (end of the string).

in our case string are wrapped in single quotes, i.e e'str'


#[test]
fn parse_escaped_literal_string() {
let sql = r#"SELECT E's1 \n s1', E's2 \\n s2', E's3 \\\n s3', E's4 \\\\n s4', E'\''"#;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also recommend some negative tests like ' Foo\'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a correct value because the last quote was escaped, there is no single quote which should close the string expr. f46b07e

ovr and others added 6 commits May 23, 2022 13:47
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good -- thank you @ovr !

@alamb alamb merged commit 2c0886d into apache:main May 25, 2022
mcheshkov pushed a commit to cube-js/sqlparser-rs that referenced this pull request Sep 2, 2024
Signed-off-by: Dmitry Patsura <talk@dmtry.me>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Can drop this after rebase on commit 2c0886d "feat: Support escaped string literals (PostgreSQL) (apache#502)", first released in 0.18.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants