Skip to content

conditional selection of alternate schemas (includes "switch" and other options) #64

Closed
@handrews

Description

@handrews

The Problem (and current workarounds)

A common use case is to select an overall validation schema (or schemas) based on how a small subset of the instance validates. In the simplest case, some property in the instance is checked against a set of literal values, and the overall validation schema(s) are chosen based on that literal value. This is generally implemented with oneOf (or anyOf) and enum (although see also the constant proposal in issue #58 ). Some forms of this problem may also be solved with dependencies.

Note: Throughout this proposal, the elements of the oneOf/anyOf lists are referred to as branches, as in this case they are being used as implicit (or in some options, explicit) conditionals.

Single selection with oneOf

This can be read as "if foo is firstValue, bar must be present and must be a list of number, otherwise if foo is secondValue, buzz must be present and a string that is at least 10 characters long":

{
    "type": "object",
    "oneOf": [
        {
            "properties": {
                "foo": {"enum": ["firstValue"]},
                "bar": {
                    "type": "array",
                    "items": {"type": "number"}
                }
            },
            "required": ["foo", "bar"]
        },
        {
            "properties": {
                "foo": {"enum": ["secondValue"]},
                "buzz": {
                    "type": "string",
                    "minLength": 10
                },
                "required": ["foo", "buzz"]
            }
        }
    ]
}

One difficulty with this approach is that the cause and effect are not clear. This could just as easily be read as "If bar is present as a list of numbers, foo must be set to firstValue. Otherwise if buzz is present and a string of at least ten characters, foo must be set to secondValue."

This is both a strength and a weakness. The oneOf construct an capture complex alternatives, but does not clearly express the idea of one part of the schema being the determining factor. In a small schema like this, either interpretation is easy to read, and it’s easy to spot the enum and guess that it is probably the determinant. In a more complex schema, where maybe there are two values of the enum that select one branch and one in the other, that is much less easy to spot.

Multiple selection with anyOf

Here is a similar example using anyOf with, where foo being set to multiSelect can potentially validate against two branches (although is only required to validate against one or the other). If foo is set to singleSelect it must validate against the lone schema that accepts that value. Multiple matches are prominently addressed in one of the proposals so we will use this example of consider the options.

{
    "type": "object",
    "required": "foo",
    "anyOf": [
        {"properties": {"foo": {"enum": ["multiSelect"]}, "bar": {"type": "number"}}},
        {"properties": {"foo": {"enum": ["multiSelect"]}, "buzz": {"type": "string"}}},
        {"properties": {"foo": {"enum": ["singleSelect"]}, "zippy": {"type": "boolean"}}},
    ]
}

In order to require validation against both "multiSelect"-designated schemas, an allOf must be introduced:

{
    "type": "object",
    "anyOf": [
        {
            "properties": {"foo": {"enum": ["multiSelect"]}},
            "allOf": [
                {"properties": {"bar": {"type": "number"}}},
                {"properties": {"buzz": {"type": "string"}}}
            ]
        }
        {"properties": {"foo": {"enum": ["singleSelect"]}, "zippy": {"type": "boolean"}}},
    ]
}

Single selection with dependencies

This schema validates the same set of instances as the schema above that uses oneOf:

{
    "type": "object",
    "properties": {
        "bar": {
            "type": "array",
            "items": {"type": "number"}
        },
        "buzz": {
            "type": "string",
        }
    },
    "dependencies": {
        "bar": {
            "properties": {"foo": {"enum": ["firstValue"]}},
            "required": ["foo", "bar"]
        },
        "buzz": {
            "properties": {"foo": {"enum": ["secondValue"]}},
            "required": ["foo", "buzz"]
        }
    }
}

Note that dependencies can only specify things based on the presence or absence of properties, so the "if bar is present, else if buzz is present" interpretation must be used for this approach. In some cases, that is exactly what needs to be expressed, but it seems to be more common to use a value as the determinant rather than the presence or absence of a particular property.

If the difference between foo being set to firstValue or secondValue was a difference in exactly how bar is validated (and buzz was not part of the schema at all), then the oneOf approach still works just fine, but the dependencies approach is impossible.

Multiple selection with dependencies

{
    "type": "object",
    "properties": {
        "bar": {"type": "number"},
        "buzz": {"type": "string"},
        "zippy": {"type": "boolean"},
    },
    "required": ["foo"],
    "dependencies": {
        "bar": {
            "properties": {"foo": {"enum": ["multiSelect"]}}
        },
        "buzz": {
            "properties": {"foo": {"enum": ["multiSelect"]}}
        },
        "zippy": {
            "properties": {"foo": {"enum": ["singleSelect"]}}
        }
    }
}

Again, the logic is inverted from the most intuitive reading, with the presence or absence of the other properties determining the value of foo. Since (in this multi-select example) "foo" is the only required property, it’s just about possible to make out the intention that "foo"’s value determines how "bar", "buzz", or "zippy" is validated. But it is arguably substantially less clear than the anyOf example, and as with the single selection example, dependencies cannot handle selection based purely on a value.

The proposals

There are two possible approaches, one of which has two variants:

  • A switch validation keyword, more or less as seen in many programming languages. Originally proposed by @geraintluff (with additional discussion in the old repo).
  • An annotation keyword that clarifies the author’s intent without changing validation rules. One form of this was proposed by @mrjj as bounding, and I am proposing an alternate syntax here.

Clarifying intent with an annotation property

This approach does not change validation at all. Rather, it adds one or two annotation properties that allow schema readers or documentation generators to understand the intent of the schema author for how branches are selected.

selectWith: pointers from outside the branches

selectWith is an annotation keyword that appears at the same level as a oneOf or anyOf. It is either a single Relative JSON Pointer or a list of them. The pointers indicate which properties (or array indicies, for that matter) are intended to determine which branch of the oneOf (or branches of the anyOf) is/are taken.

The values must allow Relative JSON Pointers (which include regular JSON Pointers) in order to allow a schema to be included in another schema as a child schema. Otherwise, the pointer would need to always have the correct full path, severely limiting re-use capabilities. The pointer is resolved with respect to the instance structure.

As an annotation property, selectWith cannot affect validation. Setting it to point to a non-existent property is legal and does not produce an error (following the general principle that nonsensical schemas are valid). Setting it to a property that will only exist on some branches is also possible and to be expected. Unspecified but allowed instance properties/array elements by default have a blank schema, allowing anything.

Here is our single-select example rewritten with selectWith:

{
    "type": "object",
    "selectWith": "0/foo",
    "oneOf": [
        {
            "properties": {
                "foo": {"enum": ["firstValue"]},
                "bar": {
                    "type": "array",
                    "items": {"type": "number"}
                }
            },
            "required": ["foo", "bar"]
        },
        {
            "properties": {
                "foo": {"enum": ["secondValue"]},
                "buzz": {
                    "type": "string",
                    "minLength": 10
                },
                "required": ["foo", "buzz"]
            }
        }
    ]
}

Recall that the pointer is relative to the instance structure, so "0/foo" so this reads "the schema used to validate this instance property are the ones which determine which branch is taken.

The selectWith for the multi-select anyOf would be identical.

selector: booleans within each branch

selector is an alternative syntax directly derived from @mrjj’s bounding proposal (so called because it put bounds on what parts of the schema needed to be fully processed, and therefore constrained error reporting only to the most relevant branches of oneOf/anyOf constructs).

Instead of one annotation keyword at the top, selector is a boolean annotation keyword that may appear anywhere within child schemas in a branch. If selector is the effect is essentially the same as putting a pointer to that location in selectWith.

The only difference is that selectWith pointers are applied to all branches, while selector can be placed in different locations in different branches (and some branches my not have any selector). However, since unspecified properties/array elements have a blank schema (allowing anything) by default, the end effect is the same. The validation outcome remains unchanged no matter which proposal is used.

Here is the single select example using selector:

{
    "type": "object",
    "oneOf": [
        {
            "properties": {
                "foo": {
                    "enum": ["firstValue"],
                    "bounding": true
                },
                "bar": {
                    "type": "array",
                    "items": {"type": "number"}
                }
            },
            "required": ["foo", "bar"]
        },
        {
            "properties": {
                "foo": {
                    "enum": ["secondValue"],
                    "bounding": true
                },
                "buzz": {
                    "type": "string",
                    "minLength": 10
                },
                "required": ["foo", "buzz"]
            }
        }
    ]
}

@mrjj’s original purpose with bounding was to narrow the scope of validation and therefore produce more specific errors. The approach is simply to validate anything marked "selector": true first, as anything that fails the selector validation will fail validation of the entire branch, so it is not necessary to proceed further (or report errors related to) that branch.

selectWith and selector comparison

While they may produce slightly different short-circuit validation behavior, neither of these change the validation outcome.

selector appears within the schema doing the selecting, which makes its effect obvious as soon as you spot it. On the other hand, spotting the selectors scattered throughout a complex set of branches is tedious and error-prone, and implementations will need to walk the branches and locate all of the selectors before being able to use them for short-circuit validation or anything else.

selectWith requires a bit more interpretation for humans who may have to eyeball how a long JSON pointer actually lines up with the branches. However, all selectors are gathered in one place and can be used as soon as they are encountered.

It would be possible to use both, for flexibility (which is why I gave them different names- something may be a selector for an outer oneOf while specifying a selectWith for an inner oneOf. I feel like having both adds complexity without providing much gain.

I am obviously biased, but I prefer selectWith simply because it reads much more clearly from the top down (assuming you put it above your branches). It clearly says "These fields are intended to determine which branch should validate." You can then look across the branch schemas and see what the selection conditions are. Which might be a bit tricky if the branches are complex, but no more so than trying to spot the selector keywords.

selectWith also more closely matches how a documentation generator would use it- the documentation would reference it as part of the description of the whole branch set, so with selector it would have to collect them into essentially the selectWith value anyway.

switch

(much of this section’s wording is copied directly from @geraintluff)

The purpose of the switch keyword is to express a series of conditional relations: "If A1 then B1, else if A2 then B2, else ...".

Values for switch

The value of switch is an array. The entries in the array must be objects, each containing:

  • then: a schema or a boolean
  • optional if: a schema
  • optional continue: a boolean
Validation of switch

For each object in the switch array:

  • if if is specified:
    • if data is not valid against the schema in if, then continue to the next item in switch
  • if the value of then is a boolean:
    • if the value of then is false, then validation fails
  • if the value of then is a schema:
    • if the data is not valid against the schema in then, then validation fails
  • if continue is set to boolean true, then:
    • continue to the next item in switch
switch examples

Here is our regular single-select implemented with switch:

{
    "type": "object",
    "switch": [
        {
            "if": {"properties": {"foo": {"enum": ["firstValue"]}}},
            "then": {
                "properties": {
                    "bar": {
                        "type": "array",
                        "items": {"type": "number"}
                    }
                },
                "required": ["foo", "bar"]
            }
        },
        {
            "if": {"properties": {"foo": {"enum": ["secondValue"]}}},
            "then": {
                "properties": {
                    "buzz": {
                        "type": "string",
                        "minLength": 10
                    },
                    "required": ["foo", "buzz"]
                }
            }
        }
    ]
}

And here is our regular multi-select. (Since nothing but foo is required and additional properties are allowed, it’s a bit silly to specify "bar" and "buzz" at separate schemas but pretend they are two schemas that make sense to do the sway because I don’t want to go redo all of the examples).

{
    "type": "object",
    "required": "foo",
    "switch": [
        {
            "if": {"properties": {"foo": {"enum": ["multiSelect"]}}},
            "then": {
                "anyOf": [
                    {"properties": {"bar": {"type": "number"}}},
                    {"properties": {"buzz": {"type": "string"}}}
                ]
            }
        },
        {
            "if": {"properties": {"foo": {"enum": ["singleSelect"]}}},
            "then": {
                "properties": {"zippy": {"type": "boolean"}}
            }
        }
    ]
}

This actually isn’t very interesting because since the two branches associated with a foo of "multiSelect" are more concisely managed with an inner anyOf, the switch can once again only choose one of its conditions. Here is a more complex example adapted from the original proposal:

{
    "type": "object",
    "switch": [
        {
            "if": {
                "properties": {"indicator": {"enum": ["yellow"]}}
            },
            "then": {
                "required": ["warningMessage"]
            },
            "continue": true
        },
        {
            "if": {
                "properties": {
                    "powerLevel": {"minimum": 9000}
                }
            },
            "then": {
                "required": ["disbelief"]
            },
        },
        {
            "then": {
                "required": ["confidence"]
            }
        }
    ]
}

In this example, if there is a yellow indicator, there must also be a warning message.
Whether there is a warning or not, a high enough "powerLevel" requires "disbelief", otherwise it requires `"confidence".

Since the "indicator" branch specifies "continue": true, we go ahead and check the other conditions whether we have a yellow indicator or not. And since continue is not specified on the "powerLevel" branch, if we match that condition we will not examine the remaining branches.

Because the last branch does not have an "if" schema, it will always match if we reach it. So the only we we do not match it is if we match the minimum power level, as that will end the processing of the switch before we consider the final branch.

Additionally, the point of allowing then to be a boolean is to provide a concise expression to say that the data must be one of the supplied options, e.g.:

{
    "switch": [
        {"if": ..., "then": ...},
        {"if": ..., "then": ...},
        {"then": false}
    ]
}

Comparing the options

selectWith/selector:

  • Geared towards documentation and schema readers
  • Clarifies schema author intent, but does not make the mental model of anyOf/oneOf any more intuitive
  • Does not change validation

switch:

  • Adds a new validation approach
  • Geared towards schema writers and readers
  • More familiar model to many programmers
  • Introduces imperative conditionals to what has previously been a declarative system

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions