Description
The Problem
Enumerations are often cryptic, particularly when they exist to match legacy systems that valued storage efficiency over readability. While it is possible to include more information with the title
and description
fields at the same level as the enum
, it is not possible to associate any additional information with each enum
value.
There are two use cases:
Documentation
This falls squarely within JSON Schema’s goals, and is simply about providing an easily-understood-by-humans string for each enum
value.
UI Generation
This is analogous to the value+label tuples common in web application framework geared towards producing select
widgets. While JSON Schema is intended to help build UIs, it is debatable as to whether this is enough of a core goal to motivate features on its own. See also issue #55
The Proposals
There have been several proposals to address this. The options so far are:
- A parallel array of human-readable names under a different keyword adjacent to
”enum”
- A parallel-ish array of [enumValue, humanName] tuples under a different keyword adjacent to
”enum”
- Replacing the current
“enum
” array with an array of tuples of (enumValue, humanName)
Due ”enum”
values supporting any JSON type, it is not possible to have a JSON object mapping values to names. This is why lists of tuples are proposed instead.
@geraintluff proposed the parallel array of names, under the keyword ”enumNames”
: https://github.com/json-schema/json-schema/wiki/enumNames-(v5-proposal)
@nemesisdesign proposed replacing with a tuple array, using the keyword ”choices”
, drawn from web app frameworks: https://github.com/json-schema/json-schema/wiki/choices-(v5-proposal-to-enhance-enum)
@sam-at-github proposed the parallel-ish array of tuples, under the keyword ”enumLut”
(although this is more or less the same as the proposed transitional period for moving the “choices”). See the comments in the issue filed for "choices"
at the old repository (and also for a discussion of the validity of UI generation as a goal): json-schema/json-schema#211
Pros and cons
- separate keywords for enum values and human-readable names preserves our existing distinction between validation keywords and annotation keywords
- Parallel arrays are error-prone and very difficult to manage with anything but a very short enumeration
- Making the array hold tuples so that the order is irrelevant makes it more robust, but involves duplication. If the enum value itself is a complex object or list, the duplication can get non-trivial
- Replacing
”enum”
with a new keyword that holds tuples is disruptive, and combines validation and annotation into one keyword, which we’ve otherwise avoided - A list of tuples, whether in addition to or in place of
”enum”
, matches how many web development frameworks set up<select>
inputs in forms.
In terms of schema design purity, the parallel array of names is the best solution. ”enum”
remains a validation property, and ”enumNames”
(or whatever we call the parallel array) is an annotation property.
In terms of ease of use, replacing the current value list with a tuple list is the best option. It removes any possibility of mis-matching values and names, and avoids any duplication. The cost is some syntactic noise for unnamed enums as the entries need to be tuples whether there are names or not.
In terms of flexibility, the parallel-ish array of tuples, which is keyed by the value rather than matched strictly by order, is the best option. It allows unnamed enums to continue to work exactly as they already do. We also preserve the validation vs annotation property separation. And it is not vulnerable to mismatches by miscounting. The cost is needing to duplicate the enum values, and then the values can get out of sync.
Steps towards a resolution
We should decide whether the separation of validation and annotation keywords is a fundamental part of the JSON Schema approach (again, see issue #55). If it is, then we can discard the "replace with a list of tuples" option, as it would be used for both validation and annotation. It would be the only annotation that leaves noise in the validation syntax even when it is not used. The value itself may be a tuple, so the top level must always be a tuple in order to avoid ambiguity, even if there is no name present.
If we do settle on the validation/annotation split principle, we're down to either adding a list of names that must be strictly parallel to the list of values, or we must add a list of tuples that are correlated by the value in the tuple. The former option is likely to get out of order or end up with the wrong number of entries, while the latter is likely to end up with values out of sync.
For simple values, keeping the values in sync should be pretty easy, but if enums supply complex data structure values, bugs are likely. I suspect that complex values in enums are quite rare.
For small sets of values, keeping lists in parallel should be easy, but long enums will lead to bugs. I suspect that long lists are more common than complex values.
If long lists are more common than complex values, we should choose the option that is more robust for long lists, which is the list of tuples. I'd appropriate the "enumName"
keyword for it, even though that was proposed for the list of names, because it clearly ties the list of tuples to the "enum"
property.
One mitigation for bugs involving values getting out of sync is that a debug mode could easily check that every value in the tuple list is an actual value of the corresponding enum. I am NOT proposing this as a step in validating instances- JSON Schema seems to generally be fine with nonsensical schemas (although that's another principle that we should confirm in issue #55). I am just speculating about an additional tool, like a linter for JSON Schema.
The point being that it would be possible to detect the most likely bugs from using a list of tuples with a theoretical linter, but the only thing such a linter could check with the list of names is that it is not longer than the enumeration. I think this, plus the likelihood of long enumerations vs complex values, gives the list of tuples alongside the existing "enum"
list the edge.