Description
This issue is about the architectural principles that:
- You can resolve all uses of the base URI as a pre-processing step
- Once you do that, evaluating a schema object is the same regardless of its parent
Technically, draft 2019-09 violates that, although in practice we can wiggle around it. But we should decide whether these principles hold, and make sure that our spec meets our own principles! Options include:
- We change how we describe the behavior of a few keywords, and keep both principles (I'm leaning towards this as explained at the end)
- We decide that you can resolve
$id
and$ref
as a pre-processing step, but in general you might need to keep track of base URIs during evaluation with an instance - We decide not to promote the idea of a pre-processing step at all (although you do need one to at least discover all of the static URIs that can be reference targets, so you never entirely get rid of it)
$id
and $ref
(the only draft-07 keywords to rely on the base URI) to full URIs during schema loading, can be pre-processed by simply setting their values to the full URIs at the same time that you find the various schemas and cache them in some sort of URI-lookup thingy.
However, in the latest draft, $anchor
, $recursiveAnchor
, and $recursiveRef
rely on the base URI as well.
For $anchor
, since it only adds a URI to associate with that schema in your cache, there's no further processing with the base URI to do there. So it's not a problem in practice.
But $recursiveAnchor
and $recursiveRef
are problems. At least in theory. In practice, because we restrict $recursiveAnchor
to resource root schemas and $recursiveRef
to only have a value of "#"
, you can handle these without needing to know the base URI. That is, in fact, why those restrictions exist.
So we can kind-of get away with ignoring this problem for now, and we could change how we talk about these keywords to remove the base URI stuff.
In general, though, these keywords as we currently describe them work by dynamically re-calculating the base URI of the URI-reference in $recursiveRef
depending on $recursiveAnchor
. This was done so that we could lift those restrictions on the value of $recursiveRef
. Or replace these keywords with a more general $dynamicAnchor
and $dynamicRef
since the name "recursive" wouldn't be entirely accurate anymore.
This works as follows:
- Resolve the URI-reference in
$recursiveRef
just as you would for$ref
to get the initial target - If the initial target contains
"$recursiveAnchor": true
, walk back up the dynamic scope to find the outermost scope that also has"$recursiveAnchor": true
, to get the intermediate target - Re-resolve the URI-reference in
$recursiveRef
against the base URI from the immediate target, to produce the final target
The nice thing about this is that it works with any URI-reference in $recursiveRef
(although some sorts of URI-references don't make much sense- the other reason we restricted it). And in practice, the "#"
restriction means that the final target is always the same as the intermediate target, so you never actually need to re-resolve the URI reference. once you find the intermediate target, you're done.
But in the general case, where the intermediate and final targets could be different, you need to know, at runtime, the base URI for both the initial and intermediate targets. You can't even resolve the $recursiveRef
to a full URI because you need to know which part was the original reference in order to re-resolve it against the intermediate target's base URI.
If we want to keep the ability to preprocess the base URI to the point where we never need to worry about parent schema objects, the best way to do that would be to reserve a keyword ($base
? $_base
?) where an implementation could safely store the base URI during preprocessing if there's a keyword in that object that would need it. Then, when $recursiveRef
or $recursiveAnchor
is encountered, and implementation could just look at that reserved location.
There are subtleties there like what to do if someone actually does try to use it as a keyword, etc. But that's what comes to mind for me.
Thoughts? At the moment, I'm leaning towards changing how we talk about $recursiveAnchor
and $recursiveRef
and saying something like "these keywords adjust the reference target within the dynamic scope" instead of "these keywords change the base URI of the reference." That gets a bit messy with another architectural principle of "always identify things with URIs", but I feel like that is more easily finessed. Besides, we at least start with a URI, and we can figure out the URI of the final target if we wanted to.