PYTHON-2252 Add examples and documentation for new UUID behavior #467

prashantmital · 2020-07-17T08:32:59Z

No description provided.

behackett

This is a really good start. I only have two comments. Nice job.

behackett · 2020-07-17T16:44:02Z

doc/examples/uuid.rst

+   written to MongoDB by existing applications that use the Python driver
+   and don't explicitly set a UUID representation.
+
+.. attention:: As of PyMongo 3.11.0,


This implies to me that this is new in 3.11.0. Better to say that PYTHON_LEGACY is the default and has been since PyMongo 2.9, the version it was introduced in.

behackett · 2020-07-17T16:47:48Z

doc/examples/uuid.rst

+
+.. _configuring-uuid-representation:
+
+Configuring a UUID Representation


Somewhere in here you should talk about round tripping data, and you might want to warn about it. It's easy for an application to read and update a record accidentally changing the byte order of the object in MongoDB.

@prashantmital, an example of changing byte order would be round tripping a Binary subtype 4 with uuid representation C# or Java. The subtype 4 would be decoded to a native UUID and then encoded as Binary subtype 3 with different bytes.

As clarified on Slack, the bytes are not different, but the subtype is. This is reflected in the new example.

No, the bytes would also change:

>>> u = uuid.UUID('00112233445566778899AABBCCDDEEFF') >>> b_standard = bson.encode({'u': u}, codec_options=CodecOptions(uuid_representation=UuidRepresentation.STANDARD)) >>> b_standard b'\x1d\x00\x00\x00\x05u\x00\x10\x00\x00\x00\x04\x00\x11"3DUfw\x88\x99\xaa\xbb\xcc\xdd\xee\xff\x00' >>> decoded = bson.decode(b_standard, codec_options=CodecOptions(uuid_representation=UuidRepresentation.JAVA_LEGACY)) >>> decoded {'u': UUID('00112233-4455-6677-8899-aabbccddeeff')} >>> u UUID('00112233-4455-6677-8899-aabbccddeeff') >>> # The UUID's subtype AND data change here: >>> b_legacy = bson.encode(decoded, codec_options=CodecOptions(uuid_representation=UuidRepresentation.JAVA_LEGACY)) >>> b_legacy b'\x1d\x00\x00\x00\x05u\x00\x10\x00\x00\x00\x03wfUD3"\x11\x00\xff\xee\xdd\xcc\xbb\xaa\x99\x88\x00' >>> b_standard b'\x1d\x00\x00\x00\x05u\x00\x10\x00\x00\x00\x04\x00\x11"3DUfw\x88\x99\xaa\xbb\xcc\xdd\xee\xff\x00' >>> decoded = bson.decode(b_legacy, codec_options=CodecOptions(uuid_representation=UuidRepresentation.JAVA_LEGACY)) >>> decoded {'u': UUID('00112233-4455-6677-8899-aabbccddeeff')} >>> decoded = bson.decode(b_legacy, codec_options=CodecOptions(uuid_representation=UuidRepresentation.STANDARD)) >>> decoded {'u': UUID('77665544-3322-1100-ffee-ddccbbaa9988')}

I am don't follow this completely. Will need to discuss.

Here's the same example written from a different perspective. App A inserts a document with STANDARD. App B find and replaces the same document with JAVA_LEGACY.

>>> coll_standard = client.t.get_collection('t', codec_options=CodecOptions(uuid_representation=UuidRepresentation.STANDARD)) >>> coll_java = client.t.get_collection('t', codec_options=CodecOptions(uuid_representation=UuidRepresentation.JAVA_LEGACY)) >>> coll_raw = client.t.get_collection('t', codec_options=CodecOptions(document_class=RawBSONDocument)) >>> u = UUID('00112233-4455-6677-8899-aabbccddeeff') >>> coll_standard.insert_one({'_id': 1, 'u': u}) <pymongo.results.InsertOneResult object at 0x7fe15e453500> >>> # Raw bytes on the server: >>> coll_raw.find_one({'_id':1}) RawBSONDocument(b'&\x00\x00\x00\x10_id\x00\x01\x00\x00\x00\x05u\x00\x10\x00\x00\x00\x04\x00\x11"3DUfw\x88\x99\xaa\xbb\xcc\xdd\xee\xff\x00', ...) >>> # Find/replace the document from a legacy Java app: >>> doc = coll_java.find_one({'_id': 1}) >>> coll_java.replace_one({'_id': 1}, doc) <pymongo.results.UpdateResult object at 0x7fe15e453b00> >>> # Raw bytes on the server have changed: >>> coll_raw.find_one({'_id':1}) RawBSONDocument(b'&\x00\x00\x00\x10_id\x00\x01\x00\x00\x00\x05u\x00\x10\x00\x00\x00\x03wfUD3"\x11\x00\xff\xee\xdd\xcc\xbb\xaa\x99\x88\x00', ...)

Added an example for this.

behackett

Can you add a reference to this new document everyplace you can set representation?

ShaneHarvey

Nice guide!

doc/examples/uuid.rst

prashantmital · 2020-07-22T06:28:12Z

Ready for final review.

doc/examples/uuid.rst

prashantmital · 2020-07-23T02:43:31Z

Ready for another look.

doc/examples/uuid.rst

ShaneHarvey · 2020-07-23T23:09:11Z

doc/examples/uuid.rst

+  unspec_collection.insert_one({'_id': 'bar', 'uuid': uuid4()})
+  Traceback (most recent call last):
+  ...
+  ValueError: cannot encode native uuid.UUID with UuidRepresentation.UNSPECIFIED. UUIDs can be manually converted to bson.Binary instances using bson.Binary.from_uuid() or a different UuidRepresentation can be configured.


Should this error message link to this documentation page? I'm wondering if we can help point the user to the right place.

Not a huge fan of hardcoding links into the codebase, but we can do it if you feel strongly about it.

What about something like "see the documentation for UuidRepresentation for more information".

Sounds good... Making this change.

doc/examples/uuid.rst

ShaneHarvey · 2020-07-29T19:15:48Z

doc/examples/uuid.rst

+
+  # Round-tripping the retrieved document silently changes the Binary bytes and subtype
+  java_collection.replace_one({'_id': 'baz'}, doc)
+  assert collection.find_one({'uuid': Binary(input_uuid.bytes, 3)}) is None


I think this should be Binary(input_uuid.bytes, 4)

Edit: Maybe include both lines:

assert collection.find_one({'uuid': Binary(input_uuid.bytes, 4)}) is None assert collection.find_one({'uuid': Binary(input_uuid.bytes, 3)}) is None

Done - added the additional assert.

ShaneHarvey · 2020-07-29T19:17:19Z

doc/examples/uuid.rst

+used to round-trip UUIDs written with ``STANDARD``. When the situation is
+reversed - i.e. when the original document is written using ``STANDARD``
+and then round-tripped using ``CSHARP_LEGACY`` or ``JAVA_LEGACY`` -
+only the :class:`~bson.binary.Binary` subtype is changed.**


The reverse situation would also change the bytes stored in the database. Also I think the "i.e. when the original document is written using STANDARD..." part is written backwards.

You are right about it being backwards.
However, I think that when the original UUID is stored with *_LEGACY and then round-tripped using STANDARD only the subtype would change - STANDARD reads subtype 3 as PYTHON_LEGACY and writes them back as subtype 4 without changing byte-order.

Ah right. So the bytes on the server don't change, but the python UUID instance will be different.

ShaneHarvey

LGTM

PYTHON-2252 Add examples and documentation for new UUID behavior

ec0102e

prashantmital requested review from behackett and ShaneHarvey July 17, 2020 08:33

behackett requested changes Jul 17, 2020

View reviewed changes

ShaneHarvey reviewed Jul 17, 2020

View reviewed changes

doc/examples/uuid.rst Outdated Show resolved Hide resolved

doc/examples/uuid.rst Outdated Show resolved Hide resolved

doc/examples/uuid.rst Outdated Show resolved Hide resolved

address review comments

1101ea2

prashantmital requested review from behackett and ShaneHarvey July 21, 2020 16:50

prashantmital added 2 commits July 21, 2020 10:02

cleanup

4692ce9

more improvements

c38b40a

behackett reviewed Jul 22, 2020

View reviewed changes

doc/examples/uuid.rst Outdated Show resolved Hide resolved

final changes

a57f0a0

prashantmital requested a review from behackett July 23, 2020 02:43

ShaneHarvey reviewed Jul 23, 2020

View reviewed changes

prashantmital added 2 commits July 24, 2020 10:51

address review comments

9a20e76

review changes

46635e2

prashantmital requested a review from ShaneHarvey July 29, 2020 02:18

ShaneHarvey reviewed Jul 29, 2020

View reviewed changes

review changes

97b4583

prashantmital requested a review from ShaneHarvey July 29, 2020 21:34

ShaneHarvey approved these changes Jul 29, 2020

View reviewed changes

prashantmital merged commit ff327b3 into mongodb:master Jul 29, 2020

prashantmital deleted the PYTHON-2252/uuid-documentation branch July 29, 2020 21:46


		.. _configuring-uuid-representation:

		Configuring a UUID Representation

PYTHON-2252 Add examples and documentation for new UUID behavior #467

PYTHON-2252 Add examples and documentation for new UUID behavior #467

Uh oh!

Conversation

prashantmital commented Jul 17, 2020

Uh oh!

behackett left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShaneHarvey Jul 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

behackett left a comment

Choose a reason for hiding this comment

Uh oh!

ShaneHarvey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

prashantmital commented Jul 22, 2020

Uh oh!

Uh oh!

prashantmital commented Jul 23, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ShaneHarvey Jul 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShaneHarvey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ShaneHarvey Jul 23, 2020 •

edited

Loading

ShaneHarvey Jul 29, 2020 •

edited

Loading