Skip to content

cqltypes: Serialize None values in collections as NULLs #241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 18, 2023

Conversation

Lorak-mmk
Copy link

@Lorak-mmk Lorak-mmk commented Jul 13, 2023

When using parepared statements, None values in collections were serialized as empty values (values with length == 0). This is unexpected and inconsistent - None values are serialized as NULLs (vlaues with length == -1) in other cases:

  • Statement arguments, both for simple and prepared statements
  • Collection elements in simple statement

This commit fixes this weird behavior - now None values should be serialized as NULLs in all cases.

Fixes #201

@Lorak-mmk Lorak-mmk requested review from fruch and avelanarius July 13, 2023 15:45
@fruch
Copy link

fruch commented Jul 13, 2023

Look good, while we are waiting for the ccm to be fixed.

Maybe you should open a PR to upstream as well ?

Copy link

@nyh nyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think you can/should check key/value being null before calling some to_binary on it, and the reason will also be clearer, but I guess your version works just as correctly too.

buf.write(keybytes)
buf.write(pack(len(valbytes)))
buf.write(valbytes)
if keybytes is not None:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: shouldn't you check if key is None, instead of first converting it to binary and checking if the result is None?

@nyh
Copy link

nyh commented Jul 13, 2023

Also, you should probably have a test for this change, but I don't how the driver is tested (is it tested against some real version of Cassandra/Scylla?).
As @fruch noted, this fix should also be suggested upstream - https://datastax-oss.atlassian.net/jira/software/c/projects/PYTHON/issues/PYTHON-1355. There's nothing Scylla-specific about it.

@Lorak-mmk Lorak-mmk force-pushed the fix-none-in-collections branch from 11f79e6 to 3851c1a Compare July 14, 2023 18:03
@Lorak-mmk
Copy link
Author

Applied the change suggested by @nyh , also added an integration tests to verify new bahavior.
As a slightly unrelated change, I bumped Scylla version in CI to 5.1 - 5.0 is no longer supported.

Copy link

@nyh nyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I think int32_pack should be changed to pack for symmetry with the other code (and potentially, correctness on Antediluvian versions of Cassandra).

buf.write(pack(len(itembytes)))
buf.write(itembytes)
if item is None:
buf.write(int32_pack(-1))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this (and also two more times below) should be pack instead of int32_pack.
I don't think we actually support of test protocol version < 3, but if we use int16_pack in pack(len(keybytes)), I don't see any reason why packing -1 should be different.

I don't think this matters in practice because nobody uses cql version < 3, but let's make this clean.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before my changes there were 2 places that used int32_pack(-1) - serializing tuples and UDTs. Now that I think of it, it may be because those are not supported before v3?
More importantly, is the pack logic (pack = int32_pack if protocol_version >= 3 else uint16_pack) sound? Before making this change I looked at docs for v2: https://github.com/apache/cassandra/blob/c08aaf69b4bb62e2fe50342dfaa329dfd833ee1e/doc/native_protocol_v2.spec
Section 6 says:

  All values are represented as [bytes] in EXECUTE and RESULT messages.
  The [bytes] format includes an int prefix denoting the length of the value.
  For that reason, the serialization formats described here will not include
  a length component.

  For legacy compatibility reasons, note that most non-string types support
  "empty" values (i.e. a value with zero length).  An empty value is distinct
  from NULL, which is encoded with a negative length.

[bytes] has 4-byte length field. I think there may be something I don't understand, but I don't see how pack is correct.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea if this is correct for version 2 of the protocol, but it can be tested (on an antique version of Cassandra). What bothered me was the inconsistency - one branch of the if() uses "pack" and the other "int32_pack", and that must be wrong (I don't know which branch is more wrong...).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the current code is the correct one - I'll fix my change. Sections 6.10-6.13 of aforementioned document describe that each item of collection is serialized as [short bytes], not [bytes].

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

i = 0
for simple_type in PRIMITIVE_DATATYPES_KEYS:
if simple_type == 'blob':
continue # Unhashable type
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this comment mean? You can have a list of blobs in CQL, I'm pretty sure it also works somehow in the Python driver?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Fixes scylladb#201

When using parepared statements, None values in collections
were serialized as empty values (values with length == 0). This is unexpected and
inconsistent - None values are serialized as NULLs (vlaues with length == -1) in other cases:
 - Statement arguments, both for simple and prepared statements
 - Collection elements in simple statement

This commit fixes this weird behavior - now None values should be serialized as NULLs in all cases.
It also adds an integration test that checks new behavior.
5.0 is no longer supported.
@Lorak-mmk Lorak-mmk force-pushed the fix-none-in-collections branch from 3851c1a to 4f5f800 Compare July 18, 2023 13:27
Copy link

@fruch fruch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fruch fruch merged commit d0f472f into scylladb:master Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"None" in collection is translated to an empty value instead of null value
3 participants