Description
The Problem (and current workarounds)
A common use case is to select an overall validation schema (or schemas) based on how a small subset of the instance validates. In the simplest case, some property in the instance is checked against a set of literal values, and the overall validation schema(s) are chosen based on that literal value. This is generally implemented with oneOf
(or anyOf
) and enum
(although see also the constant
proposal in issue #58 ). Some forms of this problem may also be solved with dependencies
.
Note: Throughout this proposal, the elements of the oneOf
/anyOf
lists are referred to as branches, as in this case they are being used as implicit (or in some options, explicit) conditionals.
Single selection with oneOf
This can be read as "if foo is firstValue, bar must be present and must be a list of number, otherwise if foo is secondValue, buzz must be present and a string that is at least 10 characters long":
{
"type": "object",
"oneOf": [
{
"properties": {
"foo": {"enum": ["firstValue"]},
"bar": {
"type": "array",
"items": {"type": "number"}
}
},
"required": ["foo", "bar"]
},
{
"properties": {
"foo": {"enum": ["secondValue"]},
"buzz": {
"type": "string",
"minLength": 10
},
"required": ["foo", "buzz"]
}
}
]
}
One difficulty with this approach is that the cause and effect are not clear. This could just as easily be read as "If bar is present as a list of numbers, foo must be set to firstValue. Otherwise if buzz is present and a string of at least ten characters, foo must be set to secondValue."
This is both a strength and a weakness. The oneOf
construct an capture complex alternatives, but does not clearly express the idea of one part of the schema being the determining factor. In a small schema like this, either interpretation is easy to read, and it’s easy to spot the enum and guess that it is probably the determinant. In a more complex schema, where maybe there are two values of the enum that select one branch and one in the other, that is much less easy to spot.
Multiple selection with anyOf
Here is a similar example using anyOf
with, where foo being set to multiSelect can potentially validate against two branches (although is only required to validate against one or the other). If foo is set to singleSelect it must validate against the lone schema that accepts that value. Multiple matches are prominently addressed in one of the proposals so we will use this example of consider the options.
{
"type": "object",
"required": "foo",
"anyOf": [
{"properties": {"foo": {"enum": ["multiSelect"]}, "bar": {"type": "number"}}},
{"properties": {"foo": {"enum": ["multiSelect"]}, "buzz": {"type": "string"}}},
{"properties": {"foo": {"enum": ["singleSelect"]}, "zippy": {"type": "boolean"}}},
]
}
In order to require validation against both "multiSelect"-designated schemas, an allOf
must be introduced:
{
"type": "object",
"anyOf": [
{
"properties": {"foo": {"enum": ["multiSelect"]}},
"allOf": [
{"properties": {"bar": {"type": "number"}}},
{"properties": {"buzz": {"type": "string"}}}
]
}
{"properties": {"foo": {"enum": ["singleSelect"]}, "zippy": {"type": "boolean"}}},
]
}
Single selection with dependencies
This schema validates the same set of instances as the schema above that uses oneOf
:
{
"type": "object",
"properties": {
"bar": {
"type": "array",
"items": {"type": "number"}
},
"buzz": {
"type": "string",
}
},
"dependencies": {
"bar": {
"properties": {"foo": {"enum": ["firstValue"]}},
"required": ["foo", "bar"]
},
"buzz": {
"properties": {"foo": {"enum": ["secondValue"]}},
"required": ["foo", "buzz"]
}
}
}
Note that dependencies
can only specify things based on the presence or absence of properties, so the "if bar is present, else if buzz is present" interpretation must be used for this approach. In some cases, that is exactly what needs to be expressed, but it seems to be more common to use a value as the determinant rather than the presence or absence of a particular property.
If the difference between foo being set to firstValue or secondValue was a difference in exactly how bar is validated (and buzz was not part of the schema at all), then the oneOf
approach still works just fine, but the dependencies
approach is impossible.
Multiple selection with dependencies
{
"type": "object",
"properties": {
"bar": {"type": "number"},
"buzz": {"type": "string"},
"zippy": {"type": "boolean"},
},
"required": ["foo"],
"dependencies": {
"bar": {
"properties": {"foo": {"enum": ["multiSelect"]}}
},
"buzz": {
"properties": {"foo": {"enum": ["multiSelect"]}}
},
"zippy": {
"properties": {"foo": {"enum": ["singleSelect"]}}
}
}
}
Again, the logic is inverted from the most intuitive reading, with the presence or absence of the other properties determining the value of foo. Since (in this multi-select example) "foo" is the only required property, it’s just about possible to make out the intention that "foo"’s value determines how "bar", "buzz", or "zippy" is validated. But it is arguably substantially less clear than the anyOf
example, and as with the single selection example, dependencies
cannot handle selection based purely on a value.
The proposals
There are two possible approaches, one of which has two variants:
- A
switch
validation keyword, more or less as seen in many programming languages. Originally proposed by @geraintluff (with additional discussion in the old repo). - An annotation keyword that clarifies the author’s intent without changing validation rules. One form of this was proposed by @mrjj as
bounding
, and I am proposing an alternate syntax here.
Clarifying intent with an annotation property
This approach does not change validation at all. Rather, it adds one or two annotation properties that allow schema readers or documentation generators to understand the intent of the schema author for how branches are selected.
selectWith
: pointers from outside the branches
selectWith
is an annotation keyword that appears at the same level as a oneOf
or anyOf
. It is either a single Relative JSON Pointer or a list of them. The pointers indicate which properties (or array indicies, for that matter) are intended to determine which branch of the oneOf
(or branches of the anyOf
) is/are taken.
The values must allow Relative JSON Pointers (which include regular JSON Pointers) in order to allow a schema to be included in another schema as a child schema. Otherwise, the pointer would need to always have the correct full path, severely limiting re-use capabilities. The pointer is resolved with respect to the instance structure.
As an annotation property, selectWith
cannot affect validation. Setting it to point to a non-existent property is legal and does not produce an error (following the general principle that nonsensical schemas are valid). Setting it to a property that will only exist on some branches is also possible and to be expected. Unspecified but allowed instance properties/array elements by default have a blank schema, allowing anything.
Here is our single-select example rewritten with selectWith
:
{
"type": "object",
"selectWith": "0/foo",
"oneOf": [
{
"properties": {
"foo": {"enum": ["firstValue"]},
"bar": {
"type": "array",
"items": {"type": "number"}
}
},
"required": ["foo", "bar"]
},
{
"properties": {
"foo": {"enum": ["secondValue"]},
"buzz": {
"type": "string",
"minLength": 10
},
"required": ["foo", "buzz"]
}
}
]
}
Recall that the pointer is relative to the instance structure, so "0/foo" so this reads "the schema used to validate this instance property are the ones which determine which branch is taken.
The selectWith
for the multi-select anyOf
would be identical.
selector
: booleans within each branch
selector
is an alternative syntax directly derived from @mrjj’s bounding
proposal (so called because it put bounds on what parts of the schema needed to be fully processed, and therefore constrained error reporting only to the most relevant branches of oneOf
/anyOf
constructs).
Instead of one annotation keyword at the top, selector
is a boolean annotation keyword that may appear anywhere within child schemas in a branch. If selector
is the effect is essentially the same as putting a pointer to that location in selectWith
.
The only difference is that selectWith
pointers are applied to all branches, while selector
can be placed in different locations in different branches (and some branches my not have any selector
). However, since unspecified properties/array elements have a blank schema (allowing anything) by default, the end effect is the same. The validation outcome remains unchanged no matter which proposal is used.
Here is the single select example using selector
:
{
"type": "object",
"oneOf": [
{
"properties": {
"foo": {
"enum": ["firstValue"],
"bounding": true
},
"bar": {
"type": "array",
"items": {"type": "number"}
}
},
"required": ["foo", "bar"]
},
{
"properties": {
"foo": {
"enum": ["secondValue"],
"bounding": true
},
"buzz": {
"type": "string",
"minLength": 10
},
"required": ["foo", "buzz"]
}
}
]
}
@mrjj’s original purpose with bounding
was to narrow the scope of validation and therefore produce more specific errors. The approach is simply to validate anything marked "selector": true
first, as anything that fails the selector validation will fail validation of the entire branch, so it is not necessary to proceed further (or report errors related to) that branch.
selectWith
and selector
comparison
While they may produce slightly different short-circuit validation behavior, neither of these change the validation outcome.
selector
appears within the schema doing the selecting, which makes its effect obvious as soon as you spot it. On the other hand, spotting the selectors scattered throughout a complex set of branches is tedious and error-prone, and implementations will need to walk the branches and locate all of the selectors before being able to use them for short-circuit validation or anything else.
selectWith
requires a bit more interpretation for humans who may have to eyeball how a long JSON pointer actually lines up with the branches. However, all selectors are gathered in one place and can be used as soon as they are encountered.
It would be possible to use both, for flexibility (which is why I gave them different names- something may be a selector
for an outer oneOf
while specifying a selectWith
for an inner oneOf
. I feel like having both adds complexity without providing much gain.
I am obviously biased, but I prefer selectWith
simply because it reads much more clearly from the top down (assuming you put it above your branches). It clearly says "These fields are intended to determine which branch should validate." You can then look across the branch schemas and see what the selection conditions are. Which might be a bit tricky if the branches are complex, but no more so than trying to spot the selector
keywords.
selectWith
also more closely matches how a documentation generator would use it- the documentation would reference it as part of the description of the whole branch set, so with selector
it would have to collect them into essentially the selectWith
value anyway.
switch
(much of this section’s wording is copied directly from @geraintluff)
The purpose of the switch
keyword is to express a series of conditional relations: "If A1 then B1, else if A2 then B2, else ...".
Values for switch
The value of switch
is an array. The entries in the array must be objects, each containing:
then
: a schema or a boolean- optional
if
: a schema - optional
continue
: a boolean
Validation of switch
For each object in the switch
array:
- if
if
is specified:- if data is not valid against the schema in
if
, then continue to the next item inswitch
- if data is not valid against the schema in
- if the value of
then
is a boolean:- if the value of
then
isfalse
, then validation fails
- if the value of
- if the value of
then
is a schema:- if the data is not valid against the schema in
then
, then validation fails
- if the data is not valid against the schema in
- if
continue
is set to booleantrue
, then:- continue to the next item in
switch
- continue to the next item in
switch
examples
Here is our regular single-select implemented with switch
:
{
"type": "object",
"switch": [
{
"if": {"properties": {"foo": {"enum": ["firstValue"]}}},
"then": {
"properties": {
"bar": {
"type": "array",
"items": {"type": "number"}
}
},
"required": ["foo", "bar"]
}
},
{
"if": {"properties": {"foo": {"enum": ["secondValue"]}}},
"then": {
"properties": {
"buzz": {
"type": "string",
"minLength": 10
},
"required": ["foo", "buzz"]
}
}
}
]
}
And here is our regular multi-select. (Since nothing but foo is required and additional properties are allowed, it’s a bit silly to specify "bar" and "buzz" at separate schemas but pretend they are two schemas that make sense to do the sway because I don’t want to go redo all of the examples).
{
"type": "object",
"required": "foo",
"switch": [
{
"if": {"properties": {"foo": {"enum": ["multiSelect"]}}},
"then": {
"anyOf": [
{"properties": {"bar": {"type": "number"}}},
{"properties": {"buzz": {"type": "string"}}}
]
}
},
{
"if": {"properties": {"foo": {"enum": ["singleSelect"]}}},
"then": {
"properties": {"zippy": {"type": "boolean"}}
}
}
]
}
This actually isn’t very interesting because since the two branches associated with a foo of "multiSelect" are more concisely managed with an inner anyOf
, the switch can once again only choose one of its conditions. Here is a more complex example adapted from the original proposal:
{
"type": "object",
"switch": [
{
"if": {
"properties": {"indicator": {"enum": ["yellow"]}}
},
"then": {
"required": ["warningMessage"]
},
"continue": true
},
{
"if": {
"properties": {
"powerLevel": {"minimum": 9000}
}
},
"then": {
"required": ["disbelief"]
},
},
{
"then": {
"required": ["confidence"]
}
}
]
}
In this example, if there is a yellow indicator, there must also be a warning message.
Whether there is a warning or not, a high enough "powerLevel"
requires "disbelief"
, otherwise it requires `"confidence".
Since the "indicator" branch specifies "continue": true
, we go ahead and check the other conditions whether we have a yellow indicator or not. And since continue is not specified on the "powerLevel" branch, if we match that condition we will not examine the remaining branches.
Because the last branch does not have an "if" schema, it will always match if we reach it. So the only we we do not match it is if we match the minimum power level, as that will end the processing of the switch before we consider the final branch.
Additionally, the point of allowing then
to be a boolean is to provide a concise expression to say that the data must be one of the supplied options, e.g.:
{
"switch": [
{"if": ..., "then": ...},
{"if": ..., "then": ...},
{"then": false}
]
}
Comparing the options
selectWith
/selector
:
- Geared towards documentation and schema readers
- Clarifies schema author intent, but does not make the mental model of
anyOf
/oneOf
any more intuitive - Does not change validation
switch
:
- Adds a new validation approach
- Geared towards schema writers and readers
- More familiar model to many programmers
- Introduces imperative conditionals to what has previously been a declarative system