Skip to content
This repository was archived by the owner on Sep 16, 2021. It is now read-only.

Commit 27b3640

Browse files
committed
added tutorial on choosing a storage layer
1 parent 8f1cb4a commit 27b3640

File tree

1 file changed

+157
-0
lines changed

1 file changed

+157
-0
lines changed
Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
Choosing a storage layer
2+
========================
3+
4+
When building a CMS no doubt the choice of storage layer is one of the key
5+
decisions to take. Many factors must be considered, the good news is that
6+
with all the components and Bundles in the CMF we take extra care to provide
7+
the necessary extension points to ensure the **CMF remains storage layer agnostic**.
8+
9+
The goal of this tutorial is to explain the considerations and why we suggest
10+
`PHPCR <http://phpcr.github.com>`_ and `PHPCR-ODM <http://www.doctrine-project.org/projects/phpcr-odm.html>`_
11+
as the ideal basis for a CMS. However all components and Bundles can be
12+
integrate with other solutions with a fairly small amount of work.
13+
14+
.. index:: PHPCR, ODM, ORM
15+
16+
Requirements for a CMS storage layer
17+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
18+
19+
At the most fundamental level a CMS is about storing, so the first requirement
20+
is that *a CMS must provide means to store content with different properties*.
21+
22+
A CMS has very different storage needs than f.e. a system for processing orders.
23+
Do note however that its entirely possible and very intended of the CMF initiative
24+
to enable developers to combine the CMF with a system for processing orders. So
25+
f.e. one could create a shopping solution using the CMF for storing the product
26+
catalog, while using another system for maintaining the inventory, customer data
27+
and orders. This leads to the second requirement, *a CMS must provide means to reference content*,
28+
both content stored inside the CMS, but also in other sysytems.
29+
30+
The actual content in a CMS tends to be organized in a tree like structure, mimicking
31+
a file system. Note that content authors might want to use different structures for how
32+
to organize the content and how to organize other aspects like the menu and the routing.
33+
This leads to the third requirement, *a CMS must provide means be represent the content as a tree structure*.
34+
Furthermore a fourth requirement is that *a CMS should allow maintaining several independent tree structures*.
35+
36+
In general data inside a CMS tends to be unstructured. So while several pages inside
37+
the CMS might be very similar, there is a good chance that there will be many permutations
38+
needing different extra fields, therefore *a CMS must not enforce a singular schema for content*.
39+
That being said, in order to better maintain the content structure and enabling UI layers
40+
from generically displaying content elements it is important to optionally be able to
41+
express rules that must be followed and that can also help attach additional semantic
42+
meaning. So *a CMS must provide means to optionally define a schema for content elements*.
43+
44+
This requirement actually also relates to another need, in that a CMS must make it easy
45+
for content authors to prepare a series of changes in a staging environment that then
46+
need to go online in a single step. This means another requirement is that its necessary
47+
that the *a CMS should support moving and exporting content between independent tree structures*.
48+
Note that exporting can be useful also for backup.
49+
50+
When making changes it would however also be useful to be able to version the change sets,
51+
so that they remain available for historical purposes, but also to be able to revert whenever
52+
needed. Therefore the next requirement is that *a CMS should provide the ability to version content*.
53+
54+
As we live in a globalized world, websites need to provide content in multiple languages
55+
addressing different regions. However not all pieces of content need to be translated
56+
and others might only be eventually translated but until then the user should be presented
57+
the content in one of the available languages, so *a CMS should provide the ability
58+
to store content in different languages, with optional fallback rules*.
59+
60+
As a CMS usually tends to store an increasing amount of content it will become necessary
61+
to provide some way for users to search the content even when the user has only a very fuzzy
62+
idea about the content they are looking for, leading to the requirement that
63+
*a CMS must provide full text search capabilities*, ideally leveraging both the contents
64+
tree structure and the data schema.
65+
66+
Another popular need is limiting read and/or write access of content to specific users
67+
or groups. Ideally this solution would also integrate with the tree structure. So it would
68+
be useful if *a CMS should provides capabilities to define access controls* that leverage the
69+
tree structure to quickly manage access for entire subtrees.
70+
71+
Finally not all steps in the content authoring process will be done by the same person.
72+
As a matter of fact there might be multiple steps all of which might not even be done
73+
by a person. Instead some of the steps might even be executed by a machine. So f.e.
74+
a photographer might upload a new image, a content author might attach the photo
75+
to some text, then the system automatically generates thumbnails and web optimized
76+
renditions and finally an editor decides on the final publication. Therefore
77+
*a CMS should provide capabilities to assist in the management of workflows*.
78+
79+
Summary
80+
~~~~~~~
81+
82+
Here is a summary of the above requirements. Note some of the requirements have
83+
a *must*, while others only have a *should*. Obviously depending on your use case
84+
you might prioritize features differently:
85+
86+
* a CMS must provide means to store content with different properties
87+
* a CMS must provide means to reference content
88+
* a CMS must provide means be represent the content as a tree structure
89+
* a CMS must provide full text search capabilities
90+
* a CMS must not enforce a singular schema for content
91+
* a CMS must provide means to optionally define a schema for content elements
92+
* a CMS should allow maintaining several independent tree structures
93+
* a CMS should support moving and exporting content between independent tree structures
94+
* a CMS should provide the ability to version content
95+
* a CMS should provide the ability to store content in different languages, with optional fallback rules
96+
* a CMS should provides capabilities to define access controls
97+
* a CMS should provide capabilities to assist in the management of workflows
98+
99+
RDBMS
100+
~~~~~
101+
102+
Looking at the above requirements it becomes apparent that ouf the box an RDBMS is
103+
ill-suited to address the needs of a CMS. RDBMS were never intended to store
104+
tree structures of unstructured content. Really the only requirement RDBMS cover from
105+
the above list is the ability to store content, some way to reference content,
106+
keep multiple separate content structures and a basic level of access controls and triggers.
107+
108+
This is not a failing of RDBMS in the sense that they were simply designed for a different
109+
use case: the ability to store, manipulate and aggregate structured data. This makes them
110+
ideal for storing inventory and orders.
111+
112+
That is not to say that its impossible to build a system on top of an RDBMS that address
113+
more or even all of the above topics. Some RDBMS natively support recursive queries, which
114+
can be useful for retrieving tree structures. Even if such native support is missing, there
115+
are algorithms like materialized path and nested sets that can enable efficient storage
116+
and retrieval of tree structures for different use cases.
117+
118+
The point is however that these all require algorithms and code on top of an RDBMS which
119+
also tightly bind your business logic to a particular RDBMS and/or algorithm even if some
120+
of them can be abstracted. So again using an ORM one could create a pluggable system for
121+
mangaging tree structures with different algorithms which prevent binding the business logic
122+
of the CMS to a particular algorithm.
123+
124+
PHPCR
125+
~~~~~
126+
127+
PHPCR essentially is a set of interfaces addressing most of the requirements from the above list.
128+
This means that PHPCR is totally storage agnostic in the sense that its possible to really
129+
put any persistence solution behind PHPCR. So in the same way as an ORM can support different
130+
tree storage algorithms via some plugin, PHPCR aims to provide an API for the entire breath of
131+
CMS needs, therefore cleanly separating the entire business logic of your CMS from the persistence
132+
choice. As a matter of fact the only feature above not natively supported by PHPCR is support
133+
for translations.
134+
135+
Thanks to the availability of several PHPCR implementations supporting various kinda of persistence
136+
choices, creating a CMS on top of PHPCR means that end users are enabled to pick and choose
137+
what works best for them, their available resources, their expertise and their scalability requirements.
138+
139+
So for the simplest use cases there is for example a Doctrine DBAL based solution provided by the
140+
`Jackalope <https://github.com/jackalope/jackalope>`_ PHPCR implementation that can use the SQLite
141+
RDBMS shipped with PHP itself. On the other end of the spectrum Jackalope also supports
142+
`Jackrabbit <http://jackrabbit.apache.org>`_ which supports clustering and can efficiently
143+
handle data into the hundreds of gigabytes. Jackrabbit by defaults simply uses the file system for
144+
persistence, but can also use an RDBMS. However future versions will support MongoDB and support for
145+
other NoSQL solutions like CouchDB or Cassandra are entirely possible. Again switching the persistence
146+
solution would require no code changes as the business logic is only bound to the PHPCR interfaces.
147+
148+
PHPCR ODM
149+
~~~~~~~~~
150+
151+
As mentioned above using PHPCR does not mean giving up on RDBMS. In many ways PHPCR can be considered
152+
a specialized ORM solution for CMS. However while PHPCR works with so called *nodes*, in an ORM
153+
people expect to be able to map class instances to a persistence layer. This is exactly what PHPCR ODM
154+
provides. It follows the same interface classes as Doctrine ORM while also exposing all the additional
155+
capabilities of PHPCR, like trees and versioning. Furthermore it also provides native support for
156+
translations, covering the only omission of PHPCR for the above mentioned requirements list of a CMS
157+
storage solution.

0 commit comments

Comments
 (0)