-
Notifications
You must be signed in to change notification settings - Fork 1.1k
PYTHON-2252 Add examples and documentation for new UUID behavior #467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PYTHON-2252 Add examples and documentation for new UUID behavior #467
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really good start. I only have two comments. Nice job.
doc/examples/uuid.rst
Outdated
written to MongoDB by existing applications that use the Python driver | ||
and don't explicitly set a UUID representation. | ||
|
||
.. attention:: As of PyMongo 3.11.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies to me that this is new in 3.11.0. Better to say that PYTHON_LEGACY is the default and has been since PyMongo 2.9, the version it was introduced in.
|
||
.. _configuring-uuid-representation: | ||
|
||
Configuring a UUID Representation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere in here you should talk about round tripping data, and you might want to warn about it. It's easy for an application to read and update a record accidentally changing the byte order of the object in MongoDB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prashantmital, an example of changing byte order would be round tripping a Binary subtype 4 with uuid representation C# or Java. The subtype 4 would be decoded to a native UUID and then encoded as Binary subtype 3 with different bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As clarified on Slack, the bytes are not different, but the subtype is. This is reflected in the new example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the bytes would also change:
>>> u = uuid.UUID('00112233445566778899AABBCCDDEEFF')
>>> b_standard = bson.encode({'u': u}, codec_options=CodecOptions(uuid_representation=UuidRepresentation.STANDARD))
>>> b_standard
b'\x1d\x00\x00\x00\x05u\x00\x10\x00\x00\x00\x04\x00\x11"3DUfw\x88\x99\xaa\xbb\xcc\xdd\xee\xff\x00'
>>> decoded = bson.decode(b_standard, codec_options=CodecOptions(uuid_representation=UuidRepresentation.JAVA_LEGACY))
>>> decoded
{'u': UUID('00112233-4455-6677-8899-aabbccddeeff')}
>>> u
UUID('00112233-4455-6677-8899-aabbccddeeff')
>>> # The UUID's subtype AND data change here:
>>> b_legacy = bson.encode(decoded, codec_options=CodecOptions(uuid_representation=UuidRepresentation.JAVA_LEGACY))
>>> b_legacy
b'\x1d\x00\x00\x00\x05u\x00\x10\x00\x00\x00\x03wfUD3"\x11\x00\xff\xee\xdd\xcc\xbb\xaa\x99\x88\x00'
>>> b_standard
b'\x1d\x00\x00\x00\x05u\x00\x10\x00\x00\x00\x04\x00\x11"3DUfw\x88\x99\xaa\xbb\xcc\xdd\xee\xff\x00'
>>> decoded = bson.decode(b_legacy, codec_options=CodecOptions(uuid_representation=UuidRepresentation.JAVA_LEGACY))
>>> decoded
{'u': UUID('00112233-4455-6677-8899-aabbccddeeff')}
>>> decoded = bson.decode(b_legacy, codec_options=CodecOptions(uuid_representation=UuidRepresentation.STANDARD))
>>> decoded
{'u': UUID('77665544-3322-1100-ffee-ddccbbaa9988')}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am don't follow this completely. Will need to discuss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the same example written from a different perspective. App A
inserts a document with STANDARD. App B
find and replaces the same document with JAVA_LEGACY.
>>> coll_standard = client.t.get_collection('t', codec_options=CodecOptions(uuid_representation=UuidRepresentation.STANDARD))
>>> coll_java = client.t.get_collection('t', codec_options=CodecOptions(uuid_representation=UuidRepresentation.JAVA_LEGACY))
>>> coll_raw = client.t.get_collection('t', codec_options=CodecOptions(document_class=RawBSONDocument))
>>> u = UUID('00112233-4455-6677-8899-aabbccddeeff')
>>> coll_standard.insert_one({'_id': 1, 'u': u})
<pymongo.results.InsertOneResult object at 0x7fe15e453500>
>>> # Raw bytes on the server:
>>> coll_raw.find_one({'_id':1})
RawBSONDocument(b'&\x00\x00\x00\x10_id\x00\x01\x00\x00\x00\x05u\x00\x10\x00\x00\x00\x04\x00\x11"3DUfw\x88\x99\xaa\xbb\xcc\xdd\xee\xff\x00', ...)
>>> # Find/replace the document from a legacy Java app:
>>> doc = coll_java.find_one({'_id': 1})
>>> coll_java.replace_one({'_id': 1}, doc)
<pymongo.results.UpdateResult object at 0x7fe15e453b00>
>>> # Raw bytes on the server have changed:
>>> coll_raw.find_one({'_id':1})
RawBSONDocument(b'&\x00\x00\x00\x10_id\x00\x01\x00\x00\x00\x05u\x00\x10\x00\x00\x00\x03wfUD3"\x11\x00\xff\xee\xdd\xcc\xbb\xaa\x99\x88\x00', ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an example for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a reference to this new document everyplace you can set representation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice guide!
Ready for final review. |
Ready for another look. |
doc/examples/uuid.rst
Outdated
unspec_collection.insert_one({'_id': 'bar', 'uuid': uuid4()}) | ||
Traceback (most recent call last): | ||
... | ||
ValueError: cannot encode native uuid.UUID with UuidRepresentation.UNSPECIFIED. UUIDs can be manually converted to bson.Binary instances using bson.Binary.from_uuid() or a different UuidRepresentation can be configured. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this error message link to this documentation page? I'm wondering if we can help point the user to the right place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a huge fan of hardcoding links into the codebase, but we can do it if you feel strongly about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about something like "see the documentation for UuidRepresentation for more information".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good... Making this change.
|
||
# Round-tripping the retrieved document silently changes the Binary bytes and subtype | ||
java_collection.replace_one({'_id': 'baz'}, doc) | ||
assert collection.find_one({'uuid': Binary(input_uuid.bytes, 3)}) is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be Binary(input_uuid.bytes, 4)
Edit: Maybe include both lines:
assert collection.find_one({'uuid': Binary(input_uuid.bytes, 4)}) is None
assert collection.find_one({'uuid': Binary(input_uuid.bytes, 3)}) is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done - added the additional assert.
used to round-trip UUIDs written with ``STANDARD``. When the situation is | ||
reversed - i.e. when the original document is written using ``STANDARD`` | ||
and then round-tripped using ``CSHARP_LEGACY`` or ``JAVA_LEGACY`` - | ||
only the :class:`~bson.binary.Binary` subtype is changed.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reverse situation would also change the bytes stored in the database. Also I think the "i.e. when the original document is written using STANDARD
..." part is written backwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right about it being backwards.
However, I think that when the original UUID is stored with *_LEGACY and then round-tripped using STANDARD only the subtype would change - STANDARD reads subtype 3 as PYTHON_LEGACY and writes them back as subtype 4 without changing byte-order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right. So the bytes on the server don't change, but the python UUID instance will be different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.