Skip to content

Handling the base URI while evaluating against an instance #868

Closed
@handrews

Description

@handrews

This issue is about the architectural principles that:

  • You can resolve all uses of the base URI as a pre-processing step
  • Once you do that, evaluating a schema object is the same regardless of its parent

Technically, draft 2019-09 violates that, although in practice we can wiggle around it. But we should decide whether these principles hold, and make sure that our spec meets our own principles! Options include:

  • We change how we describe the behavior of a few keywords, and keep both principles (I'm leaning towards this as explained at the end)
  • We decide that you can resolve $id and $ref as a pre-processing step, but in general you might need to keep track of base URIs during evaluation with an instance
  • We decide not to promote the idea of a pre-processing step at all (although you do need one to at least discover all of the static URIs that can be reference targets, so you never entirely get rid of it)

$id and $ref (the only draft-07 keywords to rely on the base URI) to full URIs during schema loading, can be pre-processed by simply setting their values to the full URIs at the same time that you find the various schemas and cache them in some sort of URI-lookup thingy.

However, in the latest draft, $anchor, $recursiveAnchor, and $recursiveRef rely on the base URI as well.

For $anchor, since it only adds a URI to associate with that schema in your cache, there's no further processing with the base URI to do there. So it's not a problem in practice.

But $recursiveAnchor and $recursiveRef are problems. At least in theory. In practice, because we restrict $recursiveAnchor to resource root schemas and $recursiveRef to only have a value of "#", you can handle these without needing to know the base URI. That is, in fact, why those restrictions exist.

So we can kind-of get away with ignoring this problem for now, and we could change how we talk about these keywords to remove the base URI stuff.


In general, though, these keywords as we currently describe them work by dynamically re-calculating the base URI of the URI-reference in $recursiveRef depending on $recursiveAnchor. This was done so that we could lift those restrictions on the value of $recursiveRef. Or replace these keywords with a more general $dynamicAnchor and $dynamicRef since the name "recursive" wouldn't be entirely accurate anymore.

This works as follows:

  • Resolve the URI-reference in $recursiveRef just as you would for $ref to get the initial target
  • If the initial target contains "$recursiveAnchor": true, walk back up the dynamic scope to find the outermost scope that also has "$recursiveAnchor": true, to get the intermediate target
  • Re-resolve the URI-reference in $recursiveRef against the base URI from the immediate target, to produce the final target

The nice thing about this is that it works with any URI-reference in $recursiveRef (although some sorts of URI-references don't make much sense- the other reason we restricted it). And in practice, the "#" restriction means that the final target is always the same as the intermediate target, so you never actually need to re-resolve the URI reference. once you find the intermediate target, you're done.

But in the general case, where the intermediate and final targets could be different, you need to know, at runtime, the base URI for both the initial and intermediate targets. You can't even resolve the $recursiveRef to a full URI because you need to know which part was the original reference in order to re-resolve it against the intermediate target's base URI.


If we want to keep the ability to preprocess the base URI to the point where we never need to worry about parent schema objects, the best way to do that would be to reserve a keyword ($base? $_base?) where an implementation could safely store the base URI during preprocessing if there's a keyword in that object that would need it. Then, when $recursiveRef or $recursiveAnchor is encountered, and implementation could just look at that reserved location.

There are subtleties there like what to do if someone actually does try to use it as a keyword, etc. But that's what comes to mind for me.

Thoughts? At the moment, I'm leaning towards changing how we talk about $recursiveAnchor and $recursiveRef and saying something like "these keywords adjust the reference target within the dynamic scope" instead of "these keywords change the base URI of the reference." That gets a bit messy with another architectural principle of "always identify things with URIs", but I feel like that is more easily finessed. Besides, we at least start with a URI, and we can figure out the URI of the final target if we wanted to.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions