v5 validation: Clearly document validation principles

**NOTE:**  This is a request for clarification in v5, and is **not** a proposal for changed behavior.
### The Problem

There are several underlying principles to validation which are currently poorly articulated, or even just implied.  Some of the more contentious arguments over feature proposals are due to unclear understanding of these principles.  Plainly stating these in the specification will help keep the evolution of JSON Schema focused and reduce feature debate noise.
### Terminology:  _indexing_ into a schema

You can index into JSON data by a property name or an array index.  This can be written in JavaScript access form, e.g.  A["foo"], A.foo, or A[0].

_Indexing_ into a schema by a property name or array index number will, within this issue, mean finding the schema that would validate a similarly indexed instance.  So if schema X validates instance A, then:

X.foo is the schema that is used to validate A.foo in the course of validating A with X.
X[5] is similarly the schema used to validate A[5]

Note that X.foo will in truth be one of:
X.properties.foo
X.patternProperties.patternThatMatchesFoo
X.additionalProperties   # if neither of the above and additionalProperties is a schema
{}   # the blank schema, if none of the above and additionalProperties is true

Similarly, X[5] will in truth be one of:
X.items[5]    # if items is an array with at least six members
X.additionalItems   # if items is an array with less than six members and addtionalItems is a schema
X.items   # if items is a schema rather than an array
{}   # if none of the above and additionalItems is true

"allOf"/"anyOf"/"oneOf"/"not" involve special considerations, which we will revisit within the principles below.  Here are the basics of how indexing applies to them:

if X is an "allOf" with two branches X1 and X2, then:
X.foo is {"allOf": [X1.foo, X2.foo]}

if X is an "anyOf" or "oneOf" with two branches X1 and X2, then X.foo must only take into account the schema(s) that validated A.  In the case of "anyOf" that may be both or just one, while in the case of "oneOf" it will _always_ be just one of the branches.

If X2 is the branch of "oneOf" that validates A, then X.foo is X2.foo
If both X1 and X2 validate A in an "anyOf", then X.foo is {"anyOf": [X1.foo, X2.foo]}

if X is a "not" schema {"not": Y}, then there is no meaningful index into X.  Depending on the rest of how Y is defined, Y.foo may or may not validate against A.foo, even though Y as a whole is guaranteed to fail validation with A due to the "not".
### Known or Suspected Principles

I am totally making these up off the top of my head.  They are a starting point:  some are missing, and some are probably wrong.  Some are defined, and others are more of a request for someone to explain the principle involved.
#### Context-free validation

Validation of a schema should succeed or fail independent of whether or where it appears within another schema.

A corollary of this is that if instance A validates against schema X, then indexing into both will produce a sub-instance that validates against the sub-schema.  Since A.foo validates against X.foo in the context of A and X, it must also validate when pulled out to stand alone.

Notably, if X is {"not": Y}, the impact of this principle is unclear because there is no meaningful X.foo.  The overall context of the "not" _must_ be taken into account in order to say anything.
#### Schemas that cannot possibly validate any instance are considered valid

That this is an underlying principle is clear from reading the spec.  However, I have not seen any explanation as to the benefit.  Is it intended to facilitate extensibility somehow?  Is it to avoid burdening validator implementors with expensive and difficult checks?  If it is the latter, is having the validation succeed the only possible solution to this requirement?

One generalized example is section 4.1 of draft 04, which says:  "Some validation keywords only apply to one or more primitive types. When the primitive type of the instance cannot be validated by a given keyword, validation for this keyword and instance SHOULD succeed."

Why should a schema of {"type": "string", "maximum": 10} which is clearly nonsensical validate cleanly against the string "foo"?

Furthermore, why should a default, or enum values, be allowed that fail validation?
#### A minimally conforming validator need only validate syntactical/structural constraints

It may ignore all annotation fields, all hypermedia fields, and all semantic validation fields (currently "format" is the only semantic field).

This is important for answering the objection that a new annotation field (for instance) places a burden on validator implementors.  Since any minimal validator must already ignore any unrecognized fields in a schema, there is no validator burden for non-validation schema fields.

This principle can be inferred from what is marked required or optional and how each field behaves, but clearly articulating it will avoid some arguments based on observations of other issue discussions.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

v5 validation: Clearly document validation principles #55

The Problem

Terminology: indexing into a schema

Known or Suspected Principles

Context-free validation

Schemas that cannot possibly validate any instance are considered valid

A minimally conforming validator need only validate syntactical/structural constraints

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

v5 validation: Clearly document validation principles #55

Description

The Problem

Terminology: indexing into a schema

Known or Suspected Principles

Context-free validation

Schemas that cannot possibly validate any instance are considered valid

A minimally conforming validator need only validate syntactical/structural constraints

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions