Skip to content

DOCSP-41972: GridFS guide #133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Sep 16, 2024
Binary file added source/includes/figures/GridFS-upload.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
72 changes: 72 additions & 0 deletions source/includes/write/gridfs.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
<?php
require 'vendor/autoload.php'; // include Composer's autoloader

use MongoDB\Client;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this earlier. The code below uses new MongoDB\Client(...) so this use statement has no effect. Consider removing it.

Alternatively, decide if you want to rely on use statements for all class references in the interest of consistency. I see you're doing that for new ObjectId(...).

use MongoDB\BSON\ObjectID;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
use MongoDB\BSON\ObjectID;
use MongoDB\BSON\ObjectId;

PHP allows case-insensitive class names, but the canonical name is ObjectId.

use MongoDB\GridFS\Bucket;
use MongoDB\Driver\Manager;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
use MongoDB\GridFS\Bucket;
use MongoDB\Driver\Manager;

These use statements are no longer needed.


$uri = getenv('MONGODB_URI') ?: throw new RuntimeException('Set the MONGODB_URI variable to your Atlas URI that connects to the sample dataset');
$client = new MongoDB\Client($uri);

// Creates a GridFS bucket or references an existing one
// start create bucket
$db = $client->db;
$bucket = $db->selectGridFSBucket();
// end create bucket

// Constructs a GridFS bucket
// start create bucket explicit
$manager = new Manager('<connection string>');
$bucket = new Bucket($manager, 'db');
// end create bucket explicit

// Creates or references a GridFS bucket with a custom name
// start create custom bucket
$custom_bucket = $client->db->selectGridFSBucket(
['bucketName' => 'myCustomBucket']
);
// end create custom bucket

// Uploads a file called "my_file" to the GridFS bucket and writes data to it
// start upload files
$stream = $bucket->openUploadStream('my_file', [
'chunkSizeBytes' => 1048576,
'metadata' => ['contentType' => 'text/plain']
]);
fwrite($stream, 'Data to store');
fclose($stream);
// end upload files

// Prints information about each file in the bucket
// start retrieve file info
$files = $bucket->find();
foreach ($files as $file_doc) {
echo json_encode($file_doc) , PHP_EOL;
}
// end retrieve file info

// Downloads the "my_file" file from the GridFS bucket and prints its contents
// start download files name
$stream = $bucket->openDownloadStreamByName('my_file');
$contents = stream_get_contents($stream);
echo json_encode($contents) , PHP_EOL;
fclose($stream);
// end download files name

// Downloads a file from the GridFS bucket by referencing its ObjectID value
// start download files id
$stream = $bucket->openDownloadStream(new ObjectID('66e0a5487c880f844c0a32b1'));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$stream = $bucket->openDownloadStream(new ObjectID('66e0a5487c880f844c0a32b1'));
$stream = $bucket->openDownloadStream(new ObjectId('66e0a5487c880f844c0a32b1'));

$contents = stream_get_contents($stream);
fclose($stream);
// end download files id

// Renames a file from the GridFS bucket with the specified ObjectID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Renames a file from the GridFS bucket with the specified ObjectID
// Renames a file from the GridFS bucket with the specified ObjectId

Consistency with the server manual.

// start rename files
$bucket->rename(new ObjectID('66e0a5487c880f844c0a32b1'), 'new_file_name');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$bucket->rename(new ObjectID('66e0a5487c880f844c0a32b1'), 'new_file_name');
$bucket->rename(new ObjectId('66e0a5487c880f844c0a32b1'), 'new_file_name');

// end rename files

// Deletes a file from the GridFS bucket with the specified ObjectID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Deletes a file from the GridFS bucket with the specified ObjectID
// Deletes a file from the GridFS bucket with the specified ObjectId

// start delete files
$bucket->delete(new ObjectID('66e0a5487c880f844c0a32b1'));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$bucket->delete(new ObjectID('66e0a5487c880f844c0a32b1'));
$bucket->delete(new ObjectId('66e0a5487c880f844c0a32b1'));

// end delete files
1 change: 1 addition & 0 deletions source/write.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ Write Data to MongoDB
/write/replace
/write/insert
/write/update
/write/gridfs
309 changes: 309 additions & 0 deletions source/write/gridfs.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,309 @@
.. _php-gridfs:

=================
Store Large Files
=================

.. contents:: On this page
:local:
:backlinks: none
:depth: 2
:class: singlecol

.. facet::
:name: genre
:values: reference

.. meta::
:keywords: binary large object, blob, storage, code example

Overview
--------

In this guide, you can learn how to store and retrieve large files in
MongoDB by using **GridFS**. GridFS is a specification implemented by
the {+php-library+} that describes how to split files into chunks when storing them
and reassemble them when retrieving them. The library's implementation of
GridFS is an abstraction that manages the operations and organization of
the file storage.

Use GridFS if the size of your files exceeds the BSON document
size limit of 16MB. For more detailed information on whether GridFS is
suitable for your use case, see :manual:`GridFS </core/gridfs>` in the
{+mdb-server+} manual.

How GridFS Works
----------------

GridFS organizes files in a **bucket**, a group of MongoDB collections
that contain the chunks of files and information describing them. The
bucket contains the following collections, named using the convention
defined in the GridFS specification:

- The ``chunks`` collection stores the binary file chunks.
- The ``files`` collection stores the file metadata.

When you create a new GridFS bucket, the library creates the preceding
collections, prefixed with the default bucket name ``fs``, unless
you specify a different name. The library also creates an index on each
collection to ensure efficient retrieval of the files and related
metadata. The library creates the GridFS bucket, if it doesn't exist, only when the first write
operation is performed. The library creates indexes only if they don't exist and when the
bucket is empty. For more information about
GridFS indexes, see :manual:`GridFS Indexes </core/gridfs/#gridfs-indexes>`
in the {+mdb-server+} manual.

When using GridFS to store files, the library splits the files into smaller
chunks, each represented by a separate document in the ``chunks`` collection.
It also creates a document in the ``files`` collection that contains
a file ID, file name, and other file metadata. You can upload the file from
memory or from a stream. View the following diagram to see how GridFS splits
the files when uploaded to a bucket:

.. figure:: /includes/figures/GridFS-upload.png
:alt: A diagram that shows how GridFS uploads a file to a bucket

When retrieving files, GridFS fetches the metadata from the ``files``
collection in the specified bucket and uses the information to reconstruct
the file from documents in the ``chunks`` collection. You can read the file
into memory or output it to a stream.

.. _gridfs-create-bucket:

Create a GridFS Bucket
----------------------

To store or retrieve files from GridFS, call the ``MongoDB\Database::selectGridFSBucket()``
method on your database. This method accesses an existing bucket or creates
a new bucket if one does not exist.

The following example calls the ``selectGridFSBucket()`` method on the ``db``
database:

.. literalinclude:: /includes/write/gridfs.php
:language: php
:dedent:
:start-after: start create bucket
:end-before: end create bucket

You can also explicitly create a GridFS bucket by calling the
constructor for the ``MongoDB\GridFS\Bucket`` class and passing the following
parameters:

- ``MongoDB\Driver\Manager`` object, which maintains connections between the
PHP extension and MongoDB
- Name of the database, as a string, for which to create the bucket

The following example explicitly constructs a GridFS bucket:

.. literalinclude:: /includes/write/gridfs.php
:language: php
:dedent:
:start-after: start create bucket explicit
:end-before: end create bucket explicit

.. _gridfs-create-custom-bucket:

Customize the Bucket Name
~~~~~~~~~~~~~~~~~~~~~~~~~

To create or reference a bucket with a custom name other than the default name
``fs``, pass an options array to the ``selectGridFSBucket()`` method that sets
the ``bucketName`` option.

The following example creates a bucket named ``'myCustomBucket'``:

.. literalinclude:: /includes/write/gridfs.php
:language: php
:dedent:
:start-after: start create custom bucket
:end-before: end create custom bucket

.. _gridfs-upload-files:

Upload Files
------------

Use the ``MongoDB\GridFS\Bucket::openUploadStream()`` method to
create an upload stream for a given file name. The ``openUploadStream()``
method allows you to specify configuration information in an options
array, which you can pass as a parameter to the ``openUploadStream()`` method.

This example uses an upload stream to perform the following
actions:

- Opens a writable stream for a new GridFS file named ``'my_file'``
- Sets the ``chunkSizeBytes`` and ``metadata`` options in an array parameter
to the ``openUploadStream()`` method
- Calls the ``fwrite()`` method to write data to ``'my_file'``, which the stream points to
- Calls the ``fclose()`` method to close the stream pointing to ``'my_file'``

.. literalinclude:: /includes/write/gridfs.php
:language: php
:dedent:
:start-after: start upload files
:end-before: end upload files

.. _gridfs-retrieve-file-info:

Retrieve File Information
-------------------------

In this section, you can learn how to retrieve file metadata stored in the
``files`` collection of the GridFS bucket. The metadata contains information
about the file it refers to, including:

- The ``_id`` of the file
- The name of the file
- The length/size of the file
- The upload date and time
- A ``metadata`` document in which you can store any other information

To retrieve files from a GridFS bucket, call the ``MongoDB\GridFS\Bucket::find()``
method on the ``MongoDB\GridFS\Bucket`` instance. The method returns a ``MongoDB\library\Cursor``
instance from which you can access the results. To learn more about ``Cursor`` objects in
the {+php-library+}, see the :ref:`<php-cursors>` guide.

The following code example shows you how to retrieve and print file metadata
from files in a GridFS bucket. It uses a ``foreach`` loop to iterate through
the returned cursor and display the contents of the file uploaded in the
:ref:`gridfs-upload-files` example:

.. io-code-block::
:copyable:

.. input:: /includes/write/gridfs.php
:start-after: start retrieve file info
:end-before: end retrieve file info
:language: php
:dedent:

.. output::
:visible: false

{"_id":{"$oid":"..."},"chunkSize":1048576,"filename":"my_file",
"length":13,"uploadDate":{"$date":{"$numberLong":"..."}},"metadata":
{"contentType":"text\/plain"},"md5":"6b24249b03ea3dd176c5a04f037a658c"}

The ``find()`` method accepts various query specifications. You can use its
``$options`` parameter to specify the sort order, maximum number of documents to return,
and the number of documents to skip before returning. To view a list of available
options, see the `API documentation <{+api+}/method/MongoDBGridFSBucket-find/#parameters>`__.

.. _gridfs-download-files:

Download Files
--------------

You can download files from your MongoDB database by using the
``MongoDB\GridFS\Bucket::openDownloadStreamByName()`` method to
create a download stream.

This example uses a download stream to perform the following actions:

- Selects a GridFS file named ``'my_file'``, uploaded in the
:ref:`gridfs-upload-files` example, and opens it as a readable stream
- Calls the ``stream_get_contents()`` method to read the contents of ``'my_file'``
- Prints the file contents
- Calls the ``fclose()`` method to close the download stream pointing to ``'my_file'``

.. io-code-block::
:copyable:

.. input:: /includes/write/gridfs.php
:start-after: start download files name
:end-before: end download files name
:language: php
:dedent:

.. output::
:visible: false

"Data to store"

.. note::

If there are multiple documents with the same file name,
GridFS will stream the most recent file with the given name (as
determined by the ``uploadDate`` field).

Alternatively, you can use the ``MongoDB\GridFS\Bucket::openDownloadStream()``
method, which takes the ``_id`` field of a file as a parameter:

.. literalinclude:: /includes/write/gridfs.php
:language: php
:dedent:
:start-after: start download files id
:end-before: end download files id

.. note::

The GridFS streaming API cannot load partial chunks. When a download
stream needs to pull a chunk from MongoDB, it pulls the entire chunk
into memory. The 255-kilobyte default chunk size is usually
sufficient, but you can reduce the chunk size to reduce memory
overhead.

.. _gridfs-rename-files:

Rename Files
------------

Use the ``MongoDB\GridFS\Bucket::rename()`` method to update the name of
a GridFS file in your bucket. You must specify the file to rename by its
``_id`` field rather than its file name.

The following example shows how to update the ``filename`` field to
``'new_file_name'`` by referencing a document's ``_id`` field:

.. literalinclude:: /includes/write/gridfs.php
:language: php
:dedent:
:start-after: start rename files
:end-before: end rename files

.. note::

The ``rename()`` method supports updating the name of only one file at
a time. To rename multiple files, retrieve a list of files matching the
file name from the bucket, extract the ``_id`` field from the files you
want to rename, and pass each value in separate calls to the ``rename()``
method.

.. _gridfs-delete-files:

Delete Files
------------

Use the ``MongoDB\GridFS\Bucket::delete()`` method to remove a file's collection
document and associated chunks from your bucket. This effectively deletes the file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worth clarifying that this is going to delete a revision of a file. If another fs.files document exists for that filename, it can still be read. And if users aren't aware of GridFS' revision behavior and selecting the most recent by default, they could be surprised to discover that they can still query for a filename after calling delete(). The clarification should help avoid that.

Note that this also applies to rename(). Common advice for both may be telling users to collect all IDs for a filename if they intend to rename or delete all of its revisions. I don't think you need example code for that (can be left as an exercise for the reader).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the note below the code examples with this information

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the new "File Revisions" notes for renames/deletes. Is there a DOCSP ticket to track future work of documenting the revision option for download methods? That seems like it'd be generally relevant to all driver GridFS tutorials.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not currently, although a few pages have a section on revisions (like Java) or mention it somewhere on the page

You must specify the file by its ``_id`` field rather than its file name.

The following example shows you how to delete a file by referencing its ``_id`` field:

.. literalinclude:: /includes/write/gridfs.php
:language: php
:dedent:
:start-after: start delete files
:end-before: end delete files

.. note::

The ``delete()`` method supports deleting only one file at a time. To
delete multiple files, retrieve the files from the bucket, extract
the ``_id`` field from the files you want to delete, and pass each value
in separate calls to the ``delete()`` method.

API Documentation
-----------------

To learn more about using the {+php-library+} to store and retrieve large files,
see the following API documentation:

- :phpmethod:`MongoDB\Database::selectGridFSBucket()`
- :phpmethod:`MongoDB\GridFS\Bucket::__construct()`
- :phpmethod:`MongoDB\GridFS\Bucket::openUploadStream()`
- :phpmethod:`MongoDB\GridFS\Bucket::find()`
- :phpmethod:`MongoDB\GridFS\Bucket::openDownloadStream()`
- :phpmethod:`MongoDB\GridFS\Bucket::rename()`
- :phpmethod:`MongoDB\GridFS\Bucket::delete()`
Loading