diff --git a/source/fundamentals.txt b/source/fundamentals.txt index 04e652175..ea3f09b50 100644 --- a/source/fundamentals.txt +++ b/source/fundamentals.txt @@ -20,5 +20,6 @@ Fundamentals /fundamentals/gridfs /fundamentals/time-series /fundamentals/typescript + /fundamentals/utf8-validation .. include:: /includes/fundamentals-sections.rst diff --git a/source/fundamentals/utf8-validation.txt b/source/fundamentals/utf8-validation.txt new file mode 100644 index 000000000..361e2e8c8 --- /dev/null +++ b/source/fundamentals/utf8-validation.txt @@ -0,0 +1,120 @@ +.. _nodejs-utf-8-validation: + +================ +UTF-8 Validation +================ + +.. default-domain:: mongodb + +.. contents:: On this page + :local: + :backlinks: none + :depth: 2 + :class: singlecol + +Overview +-------- + +In this guide, you can learn how to enable or disable the {+driver-short+}'s +**UTF-8** validation feature. UTF-8 is a character encoding specification +that ensures compatibility and consistent presentation across most operating +systems, applications, and language character sets. + +If you *enable* validation, the driver throws an error when it attempts to +convert data that contains invalid UTF-8 characters. The validation adds +processing overhead since it needs to check the data. + +If you *disable* validation, your application avoids the validation processing +overhead, but cannot guarantee consistent presentation of invalid UTF-8 data. + +The driver enables UTF-8 validation by default. It checks documents for any +characters that are not encoded in a valid UTF-8 format when it transfers data +between your application and MongoDB. + +.. note:: + + The current version of the {+driver-short+} automatically substitutes + invalid UTF-8 characters with alternate valid UTF-8 ones prior to + validation when you send data to MongoDB. Therefore, the validation + only throws an error when the setting is enabled and the driver + receives invalid UTF-8 document data from MongoDB. + +Read the sections below to learn how to set UTF-8 validation using the +{+driver-short+}. + +.. _nodejs-specify-utf-8-validation: + +Specify the UTF-8 Validation Setting +------------------------------------ + +You can specify whether the driver should perform UTF-8 validation by +defining the ``enableUtf8Validation`` setting in the options parameter +when you create a client, reference a database or collection, or call a +CRUD operation. If you omit the setting, the driver enables UTF-8 validation. + +See the following for code examples that demonstrate how to disable UTF-8 +validation on the client, database, collection, or CRUD operation: + +.. code-block:: javascript + + // disable UTF-8 validation on the client + new MongoClient('', { enableUtf8Validation: false }); + + // disable UTF-8 validation on the database + client.db('', { enableUtf8Validation: false }); + + // disable UTF-8 validation on the collection + db.collection('', { enableUtf8Validation: false }); + + // disable UTF-8 validation on a specific operation call + await collection.findOne({ title: 'Cam Jansen'}, { enableUtf8Validation: false }); + +If your application reads invalid UTF-8 from MongoDB while the +``enableUtf8Validation`` option is enabled, it throws a ``BSONError`` that +contains the following message: + +.. code-block:: + + Invalid UTF-8 string in BSON document + +.. _nodejs-utf-8-validation-scope: + +Set the Validation Scope +~~~~~~~~~~~~~~~~~~~~~~~~ + +The ``enableUtf8Validation`` setting automatically applies to the scope of the +object instance on which you included it, and any other objects created by +calls on that instance. + +For example, if you include the option on the call to instantiate a database +object, any collection instance you construct from that object inherits +the setting. Any operations you call on that collection instance also +inherit the setting. + +.. code-block:: javascript + + const database = client.db('books', { enableUtf8Validation: false }); + + // The collection inherits the UTF-8 validation disabled setting from the database + const collection = database.collection('mystery'); + + // CRUD operation runs with UTF-8 validation disabled + await collection.findOne({ title: 'Encyclopedia Brown' }); + +You can override the setting at any level of scope by including it when +constructing the object instance or when calling an operation. + +For example, if you disable validation on the collection object, you can +override the setting in individual CRUD operation calls on that +collection. + +.. code-block:: javascript + + const collection = database.collection('mystery', { enableUtf8Validation: false }); + + // CRUD operation runs with UTF-8 validation enabled + await collection.findOne({ title: 'Trixie Belden' }, { enableUtf8Validation: true }); + + // CRUD operation runs with UTF-8 validation disabled + await collection.findOne({ title: 'Enola Holmes' }); + diff --git a/source/includes/fundamentals-sections.rst b/source/includes/fundamentals-sections.rst index aecdb0601..6da2915e3 100644 --- a/source/includes/fundamentals-sections.rst +++ b/source/includes/fundamentals-sections.rst @@ -13,6 +13,7 @@ Fundamentals section: - :doc:`Log Events in the Driver ` - :doc:`Monitor Driver Events ` - :doc:`Store and Retrieve Large Files in MongoDB ` -- :doc:`Create and Query Time Series Collection` +- :doc:`Create and Query Time Series Collection ` - :doc:`Specify Type Parameters with TypeScript ` +- :doc:`Specify UTF-8 Validation Settings ` diff --git a/source/whats-new.txt b/source/whats-new.txt index 5d17c1cd4..4da495071 100644 --- a/source/whats-new.txt +++ b/source/whats-new.txt @@ -26,7 +26,7 @@ What's New in 4.3 New features of the 4.3 Node.js driver release include: - SOCKS5 support -- Option to disable UTF-8 validation +- Option to :ref:`disable UTF-8 validation ` - Type inference for nested documents .. _version-4.2: