Skip to content

PYTHON-2252 Add examples and documentation for new UUID behavior #467

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

prashantmital
Copy link
Contributor

No description provided.

Copy link
Member

@behackett behackett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good start. I only have two comments. Nice job.

written to MongoDB by existing applications that use the Python driver
and don't explicitly set a UUID representation.

.. attention:: As of PyMongo 3.11.0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies to me that this is new in 3.11.0. Better to say that PYTHON_LEGACY is the default and has been since PyMongo 2.9, the version it was introduced in.


.. _configuring-uuid-representation:

Configuring a UUID Representation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere in here you should talk about round tripping data, and you might want to warn about it. It's easy for an application to read and update a record accidentally changing the byte order of the object in MongoDB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prashantmital, an example of changing byte order would be round tripping a Binary subtype 4 with uuid representation C# or Java. The subtype 4 would be decoded to a native UUID and then encoded as Binary subtype 3 with different bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As clarified on Slack, the bytes are not different, but the subtype is. This is reflected in the new example.

Copy link
Member

@ShaneHarvey ShaneHarvey Jul 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the bytes would also change:

>>> u = uuid.UUID('00112233445566778899AABBCCDDEEFF')
>>> b_standard = bson.encode({'u': u}, codec_options=CodecOptions(uuid_representation=UuidRepresentation.STANDARD))
>>> b_standard
b'\x1d\x00\x00\x00\x05u\x00\x10\x00\x00\x00\x04\x00\x11"3DUfw\x88\x99\xaa\xbb\xcc\xdd\xee\xff\x00'
>>> decoded = bson.decode(b_standard, codec_options=CodecOptions(uuid_representation=UuidRepresentation.JAVA_LEGACY))
>>> decoded
{'u': UUID('00112233-4455-6677-8899-aabbccddeeff')}
>>> u
UUID('00112233-4455-6677-8899-aabbccddeeff')
>>> # The UUID's subtype AND data change here:
>>> b_legacy = bson.encode(decoded, codec_options=CodecOptions(uuid_representation=UuidRepresentation.JAVA_LEGACY))
>>> b_legacy
b'\x1d\x00\x00\x00\x05u\x00\x10\x00\x00\x00\x03wfUD3"\x11\x00\xff\xee\xdd\xcc\xbb\xaa\x99\x88\x00'
>>> b_standard
b'\x1d\x00\x00\x00\x05u\x00\x10\x00\x00\x00\x04\x00\x11"3DUfw\x88\x99\xaa\xbb\xcc\xdd\xee\xff\x00'
>>> decoded = bson.decode(b_legacy, codec_options=CodecOptions(uuid_representation=UuidRepresentation.JAVA_LEGACY))
>>> decoded
{'u': UUID('00112233-4455-6677-8899-aabbccddeeff')}
>>> decoded = bson.decode(b_legacy, codec_options=CodecOptions(uuid_representation=UuidRepresentation.STANDARD))
>>> decoded
{'u': UUID('77665544-3322-1100-ffee-ddccbbaa9988')}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am don't follow this completely. Will need to discuss.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the same example written from a different perspective. App A inserts a document with STANDARD. App B find and replaces the same document with JAVA_LEGACY.

>>> coll_standard = client.t.get_collection('t', codec_options=CodecOptions(uuid_representation=UuidRepresentation.STANDARD))
>>> coll_java = client.t.get_collection('t', codec_options=CodecOptions(uuid_representation=UuidRepresentation.JAVA_LEGACY))
>>> coll_raw = client.t.get_collection('t', codec_options=CodecOptions(document_class=RawBSONDocument))
>>> u = UUID('00112233-4455-6677-8899-aabbccddeeff')
>>> coll_standard.insert_one({'_id': 1, 'u': u})
<pymongo.results.InsertOneResult object at 0x7fe15e453500>
>>> # Raw bytes on the server:
>>> coll_raw.find_one({'_id':1})
RawBSONDocument(b'&\x00\x00\x00\x10_id\x00\x01\x00\x00\x00\x05u\x00\x10\x00\x00\x00\x04\x00\x11"3DUfw\x88\x99\xaa\xbb\xcc\xdd\xee\xff\x00', ...)
>>> # Find/replace the document from a legacy Java app:
>>> doc = coll_java.find_one({'_id': 1})
>>> coll_java.replace_one({'_id': 1}, doc)
<pymongo.results.UpdateResult object at 0x7fe15e453b00>
>>> # Raw bytes on the server have changed:
>>> coll_raw.find_one({'_id':1})
RawBSONDocument(b'&\x00\x00\x00\x10_id\x00\x01\x00\x00\x00\x05u\x00\x10\x00\x00\x00\x03wfUD3"\x11\x00\xff\xee\xdd\xcc\xbb\xaa\x99\x88\x00', ...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an example for this.

Copy link
Member

@behackett behackett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a reference to this new document everyplace you can set representation?

Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice guide!

@prashantmital
Copy link
Contributor Author

Ready for final review.

@prashantmital
Copy link
Contributor Author

Ready for another look.

@prashantmital prashantmital requested a review from behackett July 23, 2020 02:43
unspec_collection.insert_one({'_id': 'bar', 'uuid': uuid4()})
Traceback (most recent call last):
...
ValueError: cannot encode native uuid.UUID with UuidRepresentation.UNSPECIFIED. UUIDs can be manually converted to bson.Binary instances using bson.Binary.from_uuid() or a different UuidRepresentation can be configured.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this error message link to this documentation page? I'm wondering if we can help point the user to the right place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge fan of hardcoding links into the codebase, but we can do it if you feel strongly about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like "see the documentation for UuidRepresentation for more information".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good... Making this change.


# Round-tripping the retrieved document silently changes the Binary bytes and subtype
java_collection.replace_one({'_id': 'baz'}, doc)
assert collection.find_one({'uuid': Binary(input_uuid.bytes, 3)}) is None
Copy link
Member

@ShaneHarvey ShaneHarvey Jul 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be Binary(input_uuid.bytes, 4)

Edit: Maybe include both lines:

assert collection.find_one({'uuid': Binary(input_uuid.bytes, 4)}) is None
assert collection.find_one({'uuid': Binary(input_uuid.bytes, 3)}) is None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - added the additional assert.

used to round-trip UUIDs written with ``STANDARD``. When the situation is
reversed - i.e. when the original document is written using ``STANDARD``
and then round-tripped using ``CSHARP_LEGACY`` or ``JAVA_LEGACY`` -
only the :class:`~bson.binary.Binary` subtype is changed.**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reverse situation would also change the bytes stored in the database. Also I think the "i.e. when the original document is written using STANDARD..." part is written backwards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right about it being backwards.
However, I think that when the original UUID is stored with *_LEGACY and then round-tripped using STANDARD only the subtype would change - STANDARD reads subtype 3 as PYTHON_LEGACY and writes them back as subtype 4 without changing byte-order.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. So the bytes on the server don't change, but the python UUID instance will be different.

Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@prashantmital prashantmital merged commit ff327b3 into mongodb:master Jul 29, 2020
@prashantmital prashantmital deleted the PYTHON-2252/uuid-documentation branch July 29, 2020 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants