Skip to content

Override globs #1189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Sep 17, 2021
Merged

Override globs #1189

merged 10 commits into from
Sep 17, 2021

Conversation

zelch
Copy link
Contributor

@zelch zelch commented Sep 15, 2021

This is a revisit of the now closed (for perfectly good reasons) pull request #862 .

Now with test coverage.

(I wasn't expecting to get a chance to address this anytime soon, but since I spent most of they day buried in sqlc and pingcap/parser code, spending the time was much easier to justify.)


So, I'm going to start with the use case.

Our MariaDB database has a lot of KSUIDs, enough that I would be better off writing a tool to automate building the override statements than doing it by hand.

It would be nice if we could just override on type, replace 'binary(20)' with 'github.com/segmentio/ksuid.KSUID' and call it a day.

Sadly, at least the mysql parser turns 'binary(20)' into 'char' well before the override code gets it. In fact, it does this well before any of the sqlc code gets it.

That leaves doing it by column name, but as mentioned above... There are quite a lot of them.

On the other hand, our naming convention is at least vaguely straight forward. The columns all end with _ksuid, so we can automatically detect them.

But why write rules to do this, and have a huge override table, when we can just add globbing support to sqlc?

And so I did that instead. :)

This passes all of the tests, and is reasonably straight forward.

Zephaniah E. Loss-Cutler-Hull added 5 commits April 12, 2021 10:18
This lets us say things like '*.*_ksuid' should be of type
'github.com/segmentio/ksuid.KSUID', instead of having to list every
single column in every single table.

We can even say '*.*.*_ksuid' for every column, in every table, in every
schema.

Now, ideally we would do a type replace on 'binary(20)', however that
gets turned into a char for at least the mysql code, which means that we
never get to see 'binary(20)', and so can't do a replacement on it.

This makes use of the github.com/gobwas/glob package for the globbing,
and it pretty much Just Works the way you expect it to.
This doesn't get used, and so we can get rid of it instead of adapting
it for globs.
Fairly straight forward, same changes as for the other code generators.
Simple merge conflicts.
This modifies the existing overrides_go_types test cases to include
tests for override globs.
Copy link
Collaborator

@kyleconroy kyleconroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding tests and implementing this in a backwards-compatible way. A posted a few comments in the PR, but there's a fundamental issue with taking this approach. It has to do with the bug found in #957. In PostgreSQL (and I'm sure other databases), it's legal to have a * character in a table or column name.

kyle=# create table foo ("bar*foo" int);
CREATE TABLE
kyle=# \d+ foo;
                                    Table "public.foo"
 Column  |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
---------+---------+-----------+----------+---------+---------+--------------+-------------
 bar*foo | integer |           |          |         | plain   |              | 
kyle=# create table "foo*bar" (bar int);                                                                                                                                                               
CREATE TABLE
kyle=# \d "foo*bar";
              Table "public.foo*bar"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 bar    | integer |           |          | 

That means we need to use a different syntax. I'm not sure what that syntax is, but we should figure it out before we do any more working on the code side of things.

@zelch
Copy link
Contributor Author

zelch commented Sep 15, 2021

(I put the reply in the wrong place, moving it here.)

On the syntax question, I think we can solve this with documentation, and I'm not sure that we can reasonably solve it otherwise.

Specifically, \* will match a literal * instead of acting as a wildcard. (Assuming that I did my escapes properly for the markdown.)

We could, admittedly, pick a different character with other special meaning to SQL that would generally prevent it from being a valid schema, table, or column character, but given my surprise by #957, I'd have to do some digging to see if something like _ was valid SQL if properly escaped.

@kyleconroy
Copy link
Collaborator

@zelch That's a good point about escaping the wildcard. I doubt there are many people out there using tables or columns with * in the name, but I want to make sure they'll be able to use them if they are.

Looking over https://github.com/gobwas/glob, I'm actually less inclined to merge this pull request as-is. The syntax and patterns supported by this package are very powerful. I personally don't want sqlc to now have to support this many types of patterns from now on.

We'll need to find another package that implements a much smaller set of patterns with the * syntax.

@zelch
Copy link
Contributor Author

zelch commented Sep 15, 2021

What are your thoughts on https://github.com/zyedidia/glob as an alternate library? It's still supporting the full 'standard' glob patterns mind you, but it does seem to get escapes right.

Zephaniah E. Loss-Cutler-Hull added 5 commits September 15, 2021 15:04
No more glob.Glob in the FQN.

We're isolating the matching in a different place.
This supports the following:

* matches anything zero or more times.

? matches a single character.

\* matches a literal *.

\? matches a literal ?.

Invalid escapes will cause an error.
So the Override struct no longer uses a FQN, and instead has split out
*Match items for ColumnName, TableCatalog, TableSchema, and TableRel.

There is a new *Override.Matches function that indicates if a given
*ast.TableName (with a default schema argument) matches the override.

And the code that used to use sameTableName in the python and golang
code generators now uses the Matches function.
We're doing our own matching thing now.
@zelch
Copy link
Contributor Author

zelch commented Sep 15, 2021

Alright, so I had a little time, so I rewrote the glob support. The tests are unchanged and still function.

With that said, there are some improvements, we no longer rely on an outside library for globbing, I used https://github.com/zyedidia/glob for inspiration, but the matching code is now in internal/config/match.go, and it supports the following:

* matches anything zero or more times.

? matches a single character.

\* matches a literal *.

\? matches a literal ?.

Invalid escapes will cause an error.

Further, all of the matching types are isolated in Override, with no more changes to the FQN type. And there is now a function:

func (o *Override) Matches(n *ast.TableName, defaultSchema string) bool

I think this addresses all of the concerns, and it's definitely a lot cleaner of a change compared to how it was previously.

@zelch zelch requested a review from kyleconroy September 17, 2021 03:25
@kyleconroy
Copy link
Collaborator

This looks great. I may change a few small things after merge, but this is ready to go. Thanks for being willing to iterate on the code.

@kyleconroy kyleconroy merged commit 819f197 into sqlc-dev:main Sep 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants