|
| 1 | +Choosing a storage layer |
| 2 | +======================== |
| 3 | + |
| 4 | +When building a CMS no doubt the choice of storage layer is one of the key |
| 5 | +decisions to take. Many factors must be considered, the good news is that |
| 6 | +with all the components and Bundles in the CMF we take extra care to provide |
| 7 | +the necessary extension points to ensure the **CMF remains storage layer agnostic**. |
| 8 | + |
| 9 | +The goal of this tutorial is to explain the considerations and why we suggest |
| 10 | +`PHPCR <http://phpcr.github.com>`_ and `PHPCR-ODM <http://www.doctrine-project.org/projects/phpcr-odm.html>`_ |
| 11 | +as the ideal basis for a CMS. However all components and Bundles can be |
| 12 | +integrate with other solutions with a fairly small amount of work. |
| 13 | + |
| 14 | +.. index:: PHPCR, ODM, ORM |
| 15 | + |
| 16 | +Requirements for a CMS storage layer |
| 17 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 18 | + |
| 19 | +At the most fundamental level a CMS is about storing, so the first requirement |
| 20 | +is that *a CMS must provide means to store content with different properties*. |
| 21 | + |
| 22 | +A CMS has very different storage needs than f.e. a system for processing orders. |
| 23 | +Do note however that its entirely possible and very intended of the CMF initiative |
| 24 | +to enable developers to combine the CMF with a system for processing orders. So |
| 25 | +f.e. one could create a shopping solution using the CMF for storing the product |
| 26 | +catalog, while using another system for maintaining the inventory, customer data |
| 27 | +and orders. This leads to the second requirement, *a CMS must provide means to reference content*, |
| 28 | +both content stored inside the CMS, but also in other sysytems. |
| 29 | + |
| 30 | +The actual content in a CMS tends to be organized in a tree like structure, mimicking |
| 31 | +a file system. Note that content authors might want to use different structures for how |
| 32 | +to organize the content and how to organize other aspects like the menu and the routing. |
| 33 | +This leads to the third requirement, *a CMS must provide means be represent the content as a tree structure*. |
| 34 | +Furthermore a fourth requirement is that *a CMS should allow maintaining several independent tree structures*. |
| 35 | + |
| 36 | +In general data inside a CMS tends to be unstructured. So while several pages inside |
| 37 | +the CMS might be very similar, there is a good chance that there will be many permutations |
| 38 | +needing different extra fields, therefore *a CMS must not enforce a singular schema for content*. |
| 39 | +That being said, in order to better maintain the content structure and enabling UI layers |
| 40 | +from generically displaying content elements it is important to optionally be able to |
| 41 | +express rules that must be followed and that can also help attach additional semantic |
| 42 | +meaning. So *a CMS must provide means to optionally define a schema for content elements*. |
| 43 | + |
| 44 | +This requirement actually also relates to another need, in that a CMS must make it easy |
| 45 | +for content authors to prepare a series of changes in a staging environment that then |
| 46 | +need to go online in a single step. This means another requirement is that its necessary |
| 47 | +that the *a CMS should support moving and exporting content between independent tree structures*. |
| 48 | +Note that exporting can be useful also for backup. |
| 49 | + |
| 50 | +When making changes it would however also be useful to be able to version the change sets, |
| 51 | +so that they remain available for historical purposes, but also to be able to revert whenever |
| 52 | +needed. Therefore the next requirement is that *a CMS should provide the ability to version content*. |
| 53 | + |
| 54 | +As we live in a globalized world, websites need to provide content in multiple languages |
| 55 | +addressing different regions. However not all pieces of content need to be translated |
| 56 | +and others might only be eventually translated but until then the user should be presented |
| 57 | +the content in one of the available languages, so *a CMS should provide the ability |
| 58 | +to store content in different languages, with optional fallback rules*. |
| 59 | + |
| 60 | +As a CMS usually tends to store an increasing amount of content it will become necessary |
| 61 | +to provide some way for users to search the content even when the user has only a very fuzzy |
| 62 | +idea about the content they are looking for, leading to the requirement that |
| 63 | +*a CMS must provide full text search capabilities*, ideally leveraging both the contents |
| 64 | +tree structure and the data schema. |
| 65 | + |
| 66 | +Another popular need is limiting read and/or write access of content to specific users |
| 67 | +or groups. Ideally this solution would also integrate with the tree structure. So it would |
| 68 | +be useful if *a CMS should provides capabilities to define access controls* that leverage the |
| 69 | +tree structure to quickly manage access for entire subtrees. |
| 70 | + |
| 71 | +Finally not all steps in the content authoring process will be done by the same person. |
| 72 | +As a matter of fact there might be multiple steps all of which might not even be done |
| 73 | +by a person. Instead some of the steps might even be executed by a machine. So f.e. |
| 74 | +a photographer might upload a new image, a content author might attach the photo |
| 75 | +to some text, then the system automatically generates thumbnails and web optimized |
| 76 | +renditions and finally an editor decides on the final publication. Therefore |
| 77 | +*a CMS should provide capabilities to assist in the management of workflows*. |
| 78 | + |
| 79 | +Summary |
| 80 | +~~~~~~~ |
| 81 | + |
| 82 | +Here is a summary of the above requirements. Note some of the requirements have |
| 83 | +a *must*, while others only have a *should*. Obviously depending on your use case |
| 84 | +you might prioritize features differently: |
| 85 | + |
| 86 | +* a CMS must provide means to store content with different properties |
| 87 | +* a CMS must provide means to reference content |
| 88 | +* a CMS must provide means be represent the content as a tree structure |
| 89 | +* a CMS must provide full text search capabilities |
| 90 | +* a CMS must not enforce a singular schema for content |
| 91 | +* a CMS must provide means to optionally define a schema for content elements |
| 92 | +* a CMS should allow maintaining several independent tree structures |
| 93 | +* a CMS should support moving and exporting content between independent tree structures |
| 94 | +* a CMS should provide the ability to version content |
| 95 | +* a CMS should provide the ability to store content in different languages, with optional fallback rules |
| 96 | +* a CMS should provides capabilities to define access controls |
| 97 | +* a CMS should provide capabilities to assist in the management of workflows |
| 98 | + |
| 99 | +RDBMS |
| 100 | +~~~~~ |
| 101 | + |
| 102 | +Looking at the above requirements it becomes apparent that ouf the box an RDBMS is |
| 103 | +ill-suited to address the needs of a CMS. RDBMS were never intended to store |
| 104 | +tree structures of unstructured content. Really the only requirement RDBMS cover from |
| 105 | +the above list is the ability to store content, some way to reference content, |
| 106 | +keep multiple separate content structures and a basic level of access controls and triggers. |
| 107 | + |
| 108 | +This is not a failing of RDBMS in the sense that they were simply designed for a different |
| 109 | +use case: the ability to store, manipulate and aggregate structured data. This makes them |
| 110 | +ideal for storing inventory and orders. |
| 111 | + |
| 112 | +That is not to say that its impossible to build a system on top of an RDBMS that address |
| 113 | +more or even all of the above topics. Some RDBMS natively support recursive queries, which |
| 114 | +can be useful for retrieving tree structures. Even if such native support is missing, there |
| 115 | +are algorithms like materialized path and nested sets that can enable efficient storage |
| 116 | +and retrieval of tree structures for different use cases. |
| 117 | + |
| 118 | +The point is however that these all require algorithms and code on top of an RDBMS which |
| 119 | +also tightly bind your business logic to a particular RDBMS and/or algorithm even if some |
| 120 | +of them can be abstracted. So again using an ORM one could create a pluggable system for |
| 121 | +mangaging tree structures with different algorithms which prevent binding the business logic |
| 122 | +of the CMS to a particular algorithm. |
| 123 | + |
| 124 | +PHPCR |
| 125 | +~~~~~ |
| 126 | + |
| 127 | +PHPCR essentially is a set of interfaces addressing most of the requirements from the above list. |
| 128 | +This means that PHPCR is totally storage agnostic in the sense that its possible to really |
| 129 | +put any persistence solution behind PHPCR. So in the same way as an ORM can support different |
| 130 | +tree storage algorithms via some plugin, PHPCR aims to provide an API for the entire breath of |
| 131 | +CMS needs, therefore cleanly separating the entire business logic of your CMS from the persistence |
| 132 | +choice. As a matter of fact the only feature above not natively supported by PHPCR is support |
| 133 | +for translations. |
| 134 | + |
| 135 | +Thanks to the availability of several PHPCR implementations supporting various kinda of persistence |
| 136 | +choices, creating a CMS on top of PHPCR means that end users are enabled to pick and choose |
| 137 | +what works best for them, their available resources, their expertise and their scalability requirements. |
| 138 | + |
| 139 | +So for the simplest use cases there is for example a Doctrine DBAL based solution provided by the |
| 140 | +`Jackalope <https://github.com/jackalope/jackalope>`_ PHPCR implementation that can use the SQLite |
| 141 | +RDBMS shipped with PHP itself. On the other end of the spectrum Jackalope also supports |
| 142 | +`Jackrabbit <http://jackrabbit.apache.org>`_ which supports clustering and can efficiently |
| 143 | +handle data into the hundreds of gigabytes. Jackrabbit by defaults simply uses the file system for |
| 144 | +persistence, but can also use an RDBMS. However future versions will support MongoDB and support for |
| 145 | +other NoSQL solutions like CouchDB or Cassandra are entirely possible. Again switching the persistence |
| 146 | +solution would require no code changes as the business logic is only bound to the PHPCR interfaces. |
| 147 | + |
| 148 | +PHPCR ODM |
| 149 | +~~~~~~~~~ |
| 150 | + |
| 151 | +As mentioned above using PHPCR does not mean giving up on RDBMS. In many ways PHPCR can be considered |
| 152 | +a specialized ORM solution for CMS. However while PHPCR works with so called *nodes*, in an ORM |
| 153 | +people expect to be able to map class instances to a persistence layer. This is exactly what PHPCR ODM |
| 154 | +provides. It follows the same interface classes as Doctrine ORM while also exposing all the additional |
| 155 | +capabilities of PHPCR, like trees and versioning. Furthermore it also provides native support for |
| 156 | +translations, covering the only omission of PHPCR for the above mentioned requirements list of a CMS |
| 157 | +storage solution. |
0 commit comments