From 50a3c2db55e523f02733244b63f18fc6938d7ec6 Mon Sep 17 00:00:00 2001 From: Jason Desrosiers Date: Tue, 28 Jun 2022 12:53:23 -0700 Subject: [PATCH] Bring over changes from the patch release Many of the changes in master for the patch release have not been applied to draft-next. This is an attempt to merge those changes. --- jsonschema-core.xml | 306 ++++++++++++++++++++++++-------------- jsonschema-validation.xml | 49 ++++-- 2 files changed, 225 insertions(+), 130 deletions(-) diff --git a/jsonschema-core.xml b/jsonschema-core.xml index b6a4222f..aabd6589 100644 --- a/jsonschema-core.xml +++ b/jsonschema-core.xml @@ -21,7 +21,7 @@ - + JSON Schema: A Media Type for Describing JSON Documents @@ -38,6 +38,7 @@ + Postman
ben@jsonschema.dev https://jsonschema.dev @@ -51,7 +52,7 @@
- + Internet Engineering Task Force JSON Schema @@ -316,8 +317,8 @@ of five categories: - control schema identification through setting the schema's - canonical IRI and/or changing how the base IRI is determined + control schema identification through setting a IRI + for the schema and/or changing how the base IRI is determined produce a boolean result when applied to an instance @@ -400,6 +401,13 @@ of any vocabulary, there is no analogous mechanism to indicate individual keyword usage. + + A schema vocabulary can be defined by anything from an informal description + to a standards proposal, depending on the audience and interoperability + expectations. In particular, in order to facilitate vocabulary use within + non-public organizations, a vocabulary specification need not be published + outside of its scope of use. +
@@ -420,13 +428,25 @@ A JSON Schema resource is a schema which is canonically identified by an - absolute IRI. + absolute IRI. Schema resources MAY + also be identified by IRIs, including IRIs with fragments, + if the resulting secondary resource (as defined by + section 3.5 of RFC 3986) is identical + to the primary resource. This can occur with the empty fragment, + or when one schema resource is embedded in another. Any such IRIs + with fragments are considered to be non-canonical. The root schema is the schema that comprises the entire JSON document in question. The root schema is always a schema resource, where the IRI is determined as described in section . + + Note that documents that embed schemas in another format will not + have a root schema resource in this sense. Exactly how such usages + fit with the JSON Schema document and resource concepts will be + clarified in a future draft. + Some keywords take schemas themselves, allowing JSON Schemas to be nested: @@ -724,9 +744,9 @@ be able to support those keywords or vocabularies that contain them.
-
+
- Identifiers set the canonical IRI of a schema, or affect how such IRIs are + Identifiers define IRIs for a schema, or affect how such IRIs are resolved in references, or both. The Core vocabulary defined in this document defines several identifying keywords, most notably "$id". @@ -1335,25 +1355,32 @@ If present, the value for this keyword MUST be a string, and MUST represent a valid IRI-reference. This IRI-reference SHOULD be normalized, and MUST resolve to an - absolute-IRI (without a fragment). Therefore, - "$id" MUST NOT contain a non-empty fragment, and SHOULD NOT contain an - empty fragment. + absolute-IRI (without a fragment), + or to a IRI with an empty fragment. + + + The empty fragment form is NOT RECOMMENDED and is retained only + for backwards compatibility, and because the + application/schema+json media type defines that a IRI with an + empty fragment identifies the same resource as the same IRI + with the fragment removed. However, since this equivalence is not + part of the RFC 3986 normalization process, + implementers and schema authors cannot rely on generic IRI libraries + understanding it. - Since an empty fragment in the context of the application/schema+json media - type refers to the same resource as the base IRI without a fragment, - an implementation MAY normalize a IRI ending with an empty fragment by removing - the fragment. However, schema authors SHOULD NOT rely on this behavior - across implementations. + Therefore, "$id" MUST NOT contain a non-empty fragment, and SHOULD NOT + contain an empty fragment. The absolute-IRI form MUST be considered + the canonical IRI, regardless of the presence or absence of an empty fragment. - This is primarily allowed because older meta-schemas have an empty - fragment in their $id (or previously, id). A future draft may outright - forbid even empty fragments in "$id". + An empty fragment is currently allowed because older meta-schemas have + an empty fragment in their $id (or previously, id). + A future draft may outright forbid even empty fragments in "$id". - This IRI also serves as the base IRI for relative IRI-references in keywords - within the schema resource, in accordance with + The absolute-IRI also serves as the base IRI for relative IRI-references + in keywords within the schema resource, in accordance with RFC 3987 section 6.5 and RFC 3986 section 5.1.1 regarding base URIs embedded in content. @@ -1619,7 +1646,7 @@ media type. - Unless the "$id" keyword described in the next section is present in the + Unless the "$id" keyword described in an earlier section is present in the root schema, this base IRI SHOULD be considered the canonical IRI of the schema document's root schema resource. @@ -1746,7 +1773,7 @@ Since JSON Pointer IRI fragments are constructed based on the structure of the schema document, an embedded schema resource and its subschemas can be identified by JSON Pointer fragments relative to either its own - canonical IRI, or relative to the containing resource's IRI. + canonical IRI, or relative to any containing resource's IRI. Conceptually, a set of linked schema resources should behave @@ -1778,13 +1805,18 @@ } ]]> - - The IRI "https://example.com/foo#/items/additionalProperties" - points to the schema of the "additionalProperties" keyword in - the embedded resource. The canonical IRI of that schema, however, - is "https://example.com/bar#/additionalProperties". - + + The IRI "https://example.com/foo#/items" points to the "items" schema, + which is an embedded resource. The canonical IRI of that schema + resource, however, is "https://example.com/bar". + + + For the "additionalProperties" schema within that embedded resource, + the IRI "https://example.com/foo#/items/additionalProperties" points + to the correct object, but that object's IRI relative to its resource's + canonical IRI is "https://example.com/bar#/additionalProperties". +
Now consider the following two schema resources linked by reference @@ -1806,38 +1838,47 @@ ]]> - Here we see that the canonical IRI for that "additionalProperties" - subschema is still valid, while the non-canonical IRI with the fragment - beginning with "#/items/$ref" now resolves to nothing. + Here we see that "https://example.com/bar#/additionalProperties", + using a JSON Pointer fragment appended to the canonical IRI of + the "bar" schema resource, is still valid, while + "https://example.com/foo#/items/additionalProperties", which relied + on a JSON Pointer fragment appended to the canonical IRI of the + "foo" schema resource, no longer resolves to anything.
Note also that "https://example.com/foo#/items" is valid in both arrangements, but resolves to a different value. This IRI ends up - functioning similarly to a retrieval IRI for a resource. While valid, - examining the resolved value and either using the "$id" (if the value - is a subschema), or resolving the reference and using the "$id" of the - reference target, is preferable. + functioning similarly to a retrieval IRI for a resource. While this IRI + is valid, it is more robust to use the "$id" of the embedded or referenced + resource unless it is specifically desired to identify the object containing + the "$ref" in the second (non-embedded) arrangement. - An implementation MAY choose not to support addressing schemas - by non-canonical IRIs. As such, it is RECOMMENDED that schema authors only - use canonical IRIs, as using non-canonical IRIs may reduce - schema interoperability. + An implementation MAY choose not to support addressing schema resource + contents by IRIs using a base other than the resource's canonical IRI, + plus a JSON Pointer fragment relative to that base. Therefore, schema + authors SHOULD NOT rely on such IRIs, as using them may reduce interoperability. This is to avoid requiring implementations to keep track of a whole stack of possible base IRIs and JSON Pointer fragments for each, given that all but one will be fragile if the schema resources - are reorganized. Some have argued that this is easy so there is + are reorganized. Some + have argued that this is easy so there is no point in forbidding it, while others have argued that it complicates schema identification and should be forbidden. Feedback on this topic is encouraged. + After some discussion, we feel that we need to remove the use of + "canonical" in favour of talking about JSON Pointers which reference + across schema resource boundaries as undefined or even forbidden behavior + (https://github.com/json-schema-org/json-schema-spec/issues/937, + https://github.com/json-schema-org/json-schema-spec/issues/1183) - Further examples of such non-canonical IRIs, as well as the appropriate - canonical IRIs to use instead, are provided in appendix - . + Further examples of such non-canonical IRI construction, as well as + the appropriate canonical IRI-based fragments to use instead, + are provided in appendix .
@@ -2064,13 +2105,6 @@ The current IRI for the corresponding meta-schema is: . - - Updated vocabulary and meta-schema IRIs MAY be published between - specification drafts in order to correct errors. Implementations - SHOULD consider IRIs dated after this specification draft and - before the next to indicate the same syntax and semantics - as those listed here. -
Schema keywords typically operate independently, without @@ -2088,7 +2122,8 @@ "items", whose behavior is defined in terms of "prefixItems" - "contains", whose behavior is defined in terms of "minContains" + "contains", whose behavior is affected by the presence and value of + "minContains", in the Validation vocabulary @@ -2340,6 +2375,8 @@ positions within the instance array, it produces an annotation result of boolean true, indicating that all remaining array elements have been evaluated against this keyword's subschema. + This annotation affects the behavior of "unevaluatedItems" in the + Unevaluated vocabulary. Omitting this keyword has the same assertion behavior as @@ -2352,6 +2389,37 @@ Implementations that do not support annotation collection MUST do so.
+ +
+ + The value of this keyword MUST be a valid JSON Schema. + + + An array instance is valid against "contains" if at least one of + its elements is valid against the given schema, + except when "minContains" is present and has a value of 0, in which + case an array instance MUST be considered valid against the "contains" keyword, + even if none of its elements is valid against the given schema. + + + This keyword produces an annotation value which is an array of + the indexes to which this keyword validates successfully when applying + its subschema, in ascending order. The value MAY be a boolean "true" if + the subschema validates successfully when applied to every index of the + instance. The annotation MUST be present if the instance array to which + this keyword's schema applies is empty. + + + This annotation affects the behavior of "unevaluatedItems" in the + Unevaluated vocabulary, and MAY also be used to implement the + "minContains" and "maxContains" keywords in the Validation vocabulary. + + + The subschema MUST be applied to every array element even after the first + match has been found, in order to collect annotations for use by other + keywords. This is to ensure that all possible annotations are collected. + +
@@ -2369,6 +2437,8 @@ The annotation result of this keyword is the set of instance property names matched by this keyword. + This annotation affects the behavior of "additionalProperties" (in + this vocabulary) and "unevaluatedProperties" in the Unevaluated vocabulary. Omitting this keyword has the same assertion behavior as @@ -2392,6 +2462,8 @@ The annotation result of this keyword is the set of instance property names matched by this keyword. + This annotation affects the behavior of "additionalProperties" (in this + vocabulary) and "unevaluatedProperties" (in the Unevaluated vocabulary). Omitting this keyword has the same assertion behavior as @@ -2418,6 +2490,8 @@ The annotation result of this keyword is the set of instance property names validated by this keyword's subschema. + This annotation affects the behavior of "unevaluatedProperties" + in the Unevaluated vocabulary. Omitting this keyword has the same assertion behavior as @@ -2429,6 +2503,17 @@ checking the names in "properties" and the patterns in "patternProperties" against the instance property set. Implementations that do not support annotation collection MUST do so. + + In defining this option, it seems there is the potential for + ambiguity in the output format. The ambiguity does not affect validation results, + but it does affect the resulting output format. + The ambiguity allows for multiple valid output results depending on whether annotations + are used or a solution that "produces the same effect" as draft-07. It is understood + that annotations from failing schemas are dropped. + See our + [Decision Record](https://github.com/json-schema-org/json-schema-spec/tree/HEAD/adr/2022-04-08-cref-for-ambiguity-and-fix-later-gh-spec-issue-1172.md) + for further details. +
@@ -2526,16 +2611,9 @@ <https://json-schema.org/draft/next/vocab/unevaluated>. - The current URI for the corresponding meta-schema is: + The current IRI for the corresponding meta-schema is: . - - Updated vocabulary and meta-schema URIs MAY be published between - specification drafts in order to correct errors. Implementations - SHOULD consider URIs dated after this specification draft and - before the next to indicate the same syntax and semantics - as those listed here. -
@@ -2591,6 +2669,7 @@ positions within the instance array, it produces an annotation result of boolean true, analogous to the behavior of "items". + This annotation affects the behavior of "unevaluatedItems" in parent schemas. Omitting this keyword has the same assertion behavior as @@ -2634,6 +2713,7 @@ The annotation result of this keyword is the set of instance property names validated by this keyword's subschema. + This annotation affects the behavior of "unevaluatedProperties" in parent schemas. Omitting this keyword has the same assertion behavior as @@ -2733,8 +2813,8 @@
The absolute, dereferenced location of the validating keyword. The value MUST - be expressed as a full IRI using the canonical IRI of the relevant - schema object, and it MUST NOT include by-reference applicators + be expressed as a full IRI using the canonical IRI of the relevant schema resource + with a JSON Pointer fragment, and it MUST NOT include by-reference applicators such as "$ref" or "$dynamicRef" as non-terminal path components. It MAY end in such keywords if the error or annotation is for that keyword, such as an unresolvable reference. @@ -3156,20 +3236,6 @@ https://example.com/schemas/common#/$defs/count/minimum Type name: application Subtype name: schema+json Required parameters: N/A - - Optional parameters: - - - A non-empty list of space-separated IRIs, each identifying - a JSON Schema resource. The instance SHOULD successfully - validate against at least one of these meta-schemas. - Non-validating meta-schemas MAY be included for purposes such - as allowing clients to make use of older versions of - a meta-schema as long as the runtime instance validates - against that older version. - - - Encoding considerations: Encoding considerations are identical to those specified for the "application/json" @@ -3200,20 +3266,7 @@ https://example.com/schemas/common#/$defs/count/minimum Type name: application Subtype name: schema-instance+json - - Required parameters: - - - A non-empty list of space-separated IRIs, each identifying - a JSON Schema resource. The instance SHOULD successfully - validate against at least one of these schemas. - Non-validating schemas MAY be included for purposes such - as allowing clients to make use of older versions of a schema - as long as the runtime instance validates against that - older version. - - - + Required parameters: N/A Encoding considerations: Encoding considerations are identical to those specified for the "application/json" @@ -3277,9 +3330,9 @@ https://example.com/schemas/common#/$defs/count/minimum - + - + @@ -3358,7 +3411,7 @@ https://example.com/schemas/common#/$defs/count/minimum - The schemas at the following URI-encoded JSON + The schemas at the following IRI-encoded JSON Pointers (relative to the root schema) have the following base IRIs, and are identifiable by any listed IRI in accordance with sections and @@ -3368,10 +3421,10 @@ https://example.com/schemas/common#/$defs/count/minimum - + https://example.com/root.json - + https://example.com/root.json# @@ -3379,21 +3432,21 @@ https://example.com/schemas/common#/$defs/count/minimum https://example.com/root.json - + https://example.com/root.json#foo - + https://example.com/root.json#/$defs/A - https://example.com/other.json - + https://example.com/other.json + https://example.com/other.json# - + https://example.com/root.json#/$defs/B @@ -3401,49 +3454,61 @@ https://example.com/schemas/common#/$defs/count/minimum https://example.com/other.json - + https://example.com/other.json#bar - + https://example.com/other.json#/$defs/X - + https://example.com/root.json#/$defs/B/$defs/X - https://example.com/t/inner.json - + https://example.com/t/inner.json + https://example.com/t/inner.json#bar - + https://example.com/t/inner.json# - + https://example.com/other.json#/$defs/Y - + https://example.com/root.json#/$defs/B/$defs/Y - + urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f - + urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f# - + https://example.com/root.json#/$defs/C + + Note: The fragment part of the IRI does not make it canonical or non-canonical, + rather, the base IRI used (as part of the full IRI with any fragment) is what + determines the canonical nature of the resulting full IRI. + + Multiple "canonical" IRIs? We Acknowledge this is potentially confusing, and + direct you to read the CREF located in the + JSON Pointer fragments and embedded schema resources + section for futher comments. + + +
@@ -3469,8 +3534,8 @@ https://example.com/schemas/common#/$defs/count/minimum This transformation can be safely and reversibly done as long as all static references (e.g. "$ref") use IRI-references that resolve - to canonical IRIs, and all schema resources have an absolute-IRI - as the "$id" in their root schema. + to IRIs using the canonical resource IRI as the base, and all schema + resources have an absolute-IRI as the "$id" in their root schema. With these conditions met, each external resource can be copied @@ -3478,7 +3543,7 @@ https://example.com/schemas/common#/$defs/count/minimum schema objects, and without changing any aspect of validation or annotation results. The names of the schemas under "$defs" do not affect behavior, assuming they are each unique, as they - do not appear in canonical IRIs for the embedded resources. + do not appear in the canonical IRIs for the embedded resources.
@@ -3880,17 +3945,28 @@ https://example.com/schemas/common#/$defs/count/minimum "contains" now applies to objects as well as arrays + Use IRIs instead of URIs Remove bookending requirement for "$dynamicRef" Add "propertyDependencies" keyword + + + Improve and clarify the "type", "contains", "unevaluatedProperties", and "unevaluatedItems" keyword explanations + Clarify various aspects of "canonical URIs" + Comment on ambiguity around annotations and "additionalProperties" + Clarify Vocabularies need not be formally defined + Remove references to remaining media-type parameters + Fix multiple examples + + "$schema" MAY change for embedded resources Array-value "items" functionality is now "prefixItems" "items" subsumes the old function of "additionalItems" - "contains" and "unevaluatedItems" interactions now specified - Rename $recursive* to $dynamic* + "contains" annotation behavior, and "contains" and "unevaluatedItems" interactions now specified + Rename $recursive* to $dynamic*, with behavior modification $dynamicAnchor defines a fragment like $anchor $dynamic* (previously $recursive) no longer use runtime base URI determination Define Compound Schema Documents (bundle) and processing diff --git a/jsonschema-validation.xml b/jsonschema-validation.xml index 990ec15b..ab9e66db 100644 --- a/jsonschema-validation.xml +++ b/jsonschema-validation.xml @@ -28,7 +28,7 @@ - + JSON Schema Validation: A Vocabulary for Structural Validation of JSON @@ -47,13 +47,14 @@ </author> <author fullname="Ben Hutton" initials="B" surname="Hutton" role="editor"> + <organization>Postman</organization> <address> <email>ben@jsonschema.dev</email> <uri>https://jsonschema.dev</uri> </address> </author> - <date year="2021"/> + <date year="2022"/> <workgroup>Internet Engineering Task Force</workgroup> <keyword>JSON</keyword> <keyword>Schema</keyword> @@ -442,8 +443,9 @@ </t> <t> A value of 0 is allowed, but is only useful for setting a range - of occurrences from 0 to the value of "maxContains". A value of - 0 with no "maxContains" causes "contains" to always pass validation. + of occurrences from 0 to the value of "maxContains". A value of + 0 causes "minContains" and "contains" to always pass validation + (but validation can still fail against a "maxContains" keyword). </t> <t> Omitting this keyword has the same behavior as a value of 1. @@ -704,30 +706,34 @@ <list style="hanging"> <t hangText="date-time:"> A string instance is valid against this attribute if it is - a valid representation according to the "date-time" production. + a valid representation according to the "date-time' ABNF rule + (referenced above) </t> <t hangText="date:"> A string instance is valid against this attribute if it is - a valid representation according to the "full-date" production. + a valid representation according to the "full-date" ABNF rule + (referenced above) </t> <t hangText="time:"> A string instance is valid against this attribute if it is - a valid representation according to the "full-time" production. + a valid representation according to the "full-time" ABNF rule + (referenced above) </t> <t hangText="duration:"> A string instance is valid against this attribute if it is - a valid representation according to the "duration" production. + a valid representation according to the "duration" ABNF rule + (referenced above) </t> </list> </t> <t> Implementations MAY support additional attributes using the other - production names defined anywhere in that RFC. If "full-date" or "full-time" + format names defined anywhere in that RFC. If "full-date" or "full-time" are implemented, the corresponding short form ("date" or "time" respectively) MUST be implemented, and MUST behave identically. Implementations SHOULD NOT define extension attributes - with any name matching an RFC 3339 production unless it validates - according to the rules of that production. + with any name matching an RFC 3339 format unless it validates + according to the rules of that format. <cref> There is not currently consensus on the need for supporting all RFC 3339 formats, so this approach of reserving the @@ -964,7 +970,7 @@ <t> If the instance value is a string, this property defines that the string - SHOULD be interpreted as binary data and decoded using the encoding + SHOULD be interpreted as encoded binary data and decoded using the encoding named by this property. </t> @@ -972,7 +978,14 @@ Possible values indicating base 16, 32, and 64 encodings with several variations are listed in <xref target="RFC4648">RFC 4648</xref>. Additionally, sections 6.7 and 6.8 of <xref target="RFC2045">RFC 2045</xref> provide - encodings used in MIME. As "base64" is defined in both RFCs, the definition + encodings used in MIME. This keyword is derived from MIME's + Content-Transfer-Encoding header, which was designed to map binary data + into ASCII characters. It is not related to HTTP's Content-Encoding header, + which is used to encode (e.g. compress or encrypt) + the content of HTTP request and responses. + </t> + <t> + As "base64" is defined in both RFCs, the definition from RFC 4648 SHOULD be assumed unless the string is specifically intended for use in a MIME context. Note that all of these encodings result in strings consisting only of 7-bit ASCII characters. Therefore, this keyword @@ -1342,9 +1355,9 @@ <author initials="G." surname="Dennis"> <organization/> </author> - <date year="2020" month="December"/> + <date year="2022" month="June"/> </front> - <seriesInfo name="Internet-Draft" value="draft-bhutton-json-schema-00" /> + <seriesInfo name="Internet-Draft" value="draft-bhutton-json-schema-01" /> </reference> </references> @@ -1429,6 +1442,12 @@ </t> <t> <list style="hanging"> + <t hangText="draft-bhutton-json-schema-validation-01"> + <list style="symbols"> + <t>Improve and clarify the "minContains" keyword explanation</t> + <t>Remove the use of "production" in favour of "ABNF rule"</t> + </list> + </t> <t hangText="draft-bhutton-json-schema-validation-00"> <list style="symbols"> <t>Correct email format RFC reference to 5321 instead of 5322</t>