-
Notifications
You must be signed in to change notification settings - Fork 48
cqltypes: Serialize None values in collections as NULLs #241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Look good, while we are waiting for the ccm to be fixed. Maybe you should open a PR to upstream as well ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I think you can/should check key/value being null before calling some to_binary on it, and the reason will also be clearer, but I guess your version works just as correctly too.
cassandra/cqltypes.py
Outdated
buf.write(keybytes) | ||
buf.write(pack(len(valbytes))) | ||
buf.write(valbytes) | ||
if keybytes is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: shouldn't you check if key is None, instead of first converting it to binary and checking if the result is None?
Also, you should probably have a test for this change, but I don't how the driver is tested (is it tested against some real version of Cassandra/Scylla?). |
11f79e6
to
3851c1a
Compare
Applied the change suggested by @nyh , also added an integration tests to verify new bahavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but I think int32_pack should be changed to pack for symmetry with the other code (and potentially, correctness on Antediluvian versions of Cassandra).
cassandra/cqltypes.py
Outdated
buf.write(pack(len(itembytes))) | ||
buf.write(itembytes) | ||
if item is None: | ||
buf.write(int32_pack(-1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this (and also two more times below) should be pack instead of int32_pack.
I don't think we actually support of test protocol version < 3, but if we use int16_pack in pack(len(keybytes)), I don't see any reason why packing -1 should be different.
I don't think this matters in practice because nobody uses cql version < 3, but let's make this clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before my changes there were 2 places that used int32_pack(-1)
- serializing tuples and UDTs. Now that I think of it, it may be because those are not supported before v3?
More importantly, is the pack
logic (pack = int32_pack if protocol_version >= 3 else uint16_pack
) sound? Before making this change I looked at docs for v2: https://github.com/apache/cassandra/blob/c08aaf69b4bb62e2fe50342dfaa329dfd833ee1e/doc/native_protocol_v2.spec
Section 6 says:
All values are represented as [bytes] in EXECUTE and RESULT messages.
The [bytes] format includes an int prefix denoting the length of the value.
For that reason, the serialization formats described here will not include
a length component.
For legacy compatibility reasons, note that most non-string types support
"empty" values (i.e. a value with zero length). An empty value is distinct
from NULL, which is encoded with a negative length.
[bytes]
has 4-byte length field. I think there may be something I don't understand, but I don't see how pack
is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea if this is correct for version 2 of the protocol, but it can be tested (on an antique version of Cassandra). What bothered me was the inconsistency - one branch of the if() uses "pack" and the other "int32_pack", and that must be wrong (I don't know which branch is more wrong...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the current code is the correct one - I'll fix my change. Sections 6.10-6.13 of aforementioned document describe that each item of collection is serialized as [short bytes]
, not [bytes]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
i = 0 | ||
for simple_type in PRIMITIVE_DATATYPES_KEYS: | ||
if simple_type == 'blob': | ||
continue # Unhashable type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this comment mean? You can have a list of blobs in CQL, I'm pretty sure it also works somehow in the Python driver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Fixes scylladb#201 When using parepared statements, None values in collections were serialized as empty values (values with length == 0). This is unexpected and inconsistent - None values are serialized as NULLs (vlaues with length == -1) in other cases: - Statement arguments, both for simple and prepared statements - Collection elements in simple statement This commit fixes this weird behavior - now None values should be serialized as NULLs in all cases. It also adds an integration test that checks new behavior.
5.0 is no longer supported.
3851c1a
to
4f5f800
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
When using parepared statements, None values in collections were serialized as empty values (values with length == 0). This is unexpected and inconsistent - None values are serialized as NULLs (vlaues with length == -1) in other cases:
This commit fixes this weird behavior - now None values should be serialized as NULLs in all cases.
Fixes #201