Skip to content

add "ann" as reserved keyword #2005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 4.x
Choose a base branch
from

Conversation

Hazel-Datastax
Copy link

I found a corner case when using Data API (stargate/data-api#1806). I cannot use ann as my table name, but I can use it in CQL:

cassandra@cqlsh:default_keyspace> CREATE TABLE default_keyspace."ann" (t text PRIMARY KEY,v VECTOR<float,5>);

The reason is, inside the Java Driver, it has a set that contains all the reserved keywords. When the query builder builds the create table query, it will call tableName.asCql(true). Inside asCql(true) method, it will check if the string is in the reserved keywords set and double quoted if it’s in. Unfortunately, the set doesn’t contain ann.

I guess ann was introduced later and the keywords set hasn't been updated accordingly.

@absurdfarce
Copy link
Contributor

Good catch @Hazel-Datastax! We actually had to address something very similar to this for dsbulk. Should've occurred to me this part of the Java driver might have an issue as well.

@absurdfarce
Copy link
Contributor

absurdfarce commented Apr 25, 2025

So, there's definitely something weird going on here.

In Apache Cassandra 5.x "ann" is very definitely an unreserved keyword. The CQL docs in the Cassandra repo talk about the distinction a bit; reserved keywords can never be used as an identifier while unreserved keywords can in some situations... but those situations aren't specified. If an unreserved identifier is used in a spot that might introduce conflict it presumably would have to be quoted... but it's not clear how the driver can identify such a situation.

The dsbulk change I referenced above doesn't need to worry about this distinction. It includes it's own ANTLR-derived parser (a subset of what's actually used in Cassandra) so it can identify these keyword cases using (essentially) the same grammar Apache Cassandra uses.

I also note that the set "ann" is added to in this PR is explicitly for reserved keywords; note that each member of that set is a reserved keyword (as defined in the CQL docs above) and that no unreserved keywords are included. Presumably that's true because the code can always quote reserved keywords when generating CQL strings... but unreserved keywords are a bit tricker.

To make it even worse: I note the following against Apache Cassandra 5.0.0:

cqlsh> describe keyspace test;

CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes = true;
cqlsh> CREATE TABLE test.filtering (i int PRIMARY KEY, j float);
cqlsh> CREATE TABLE test.ann (i int PRIMARY KEY, j float);
cqlsh>

The string "ann" works just fine as a table name there. But when I try something similar on Astra I get results similar to what I think you're describing:

token@cqlsh> CREATE TABLE janus.filtering (i int PRIMARY KEY, j float);
token@cqlsh> CREATE TABLE janus.function (i int PRIMARY KEY, j float);
token@cqlsh> CREATE TABLE janus.ann (i int PRIMARY KEY, j float);
SyntaxException: line 1:23 mismatched character '(' expecting set null
token@cqlsh> CREATE TABLE janus.”ann” (i int PRIMARY KEY, j float);
Invalid syntax at char 20
  CREATE TABLE janus.”ann” (i int PRIMARY KEY, j float);
                     ^
token@cqlsh> CREATE TABLE janus.’ann’ (i int PRIMARY KEY, j float);
Invalid syntax at char 20
  CREATE TABLE janus.’ann’ (i int PRIMARY KEY, j float);
                     ^

So we've clearly got inconsistencies in the behaviour here between Astra and Apache Cassandra. But to make matters worse Astra is internally inconsistent: some unreserved keywords (such as "filtering" and "function") are just fine to use as table names while I can't get "ann" to be used as a table name whether I quote it or not.

@absurdfarce
Copy link
Contributor

@adutra @aratno @tolbertam I'm curious about what you guys think of this. Short version:

  • C* now defines "ann" and "vector" as unreserved keywords
  • The driver only has logic to specifically quote strings containing reserved keywords... since those need to be quoted in all cases when used in queries
  • Unreserved keywords need to be quoted in some situations but not others

My current thinking is that there isn't really much we can do here. Without better guidance as to when unreserved keywords should be quoted or not the Java driver can't really interject so it's up to the user to quote unreserved keywords when appropriate. If you have a full-blown CQL parser you could do better (see the referenced dsbulk issue above) but short of that you're kind of limited.

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants