Description
Background
RDFLib usage frequently involves namespaces to handle IRI prefixes. E.g., to use the concept http://example.org/ontology/Foo
, these are three ways to instantiate a rdflib.URIRef
for that concept:
import rdflib
# 1.
node_1 = rdflib.URIRef("http://example.org/ontology/Foo")
# 2.
NS_EX = rdflib.Namespace("http://example.org/ontology/")
node_2 = NS_EX["Foo"]
# 3.
node_3 = NS_EX.Foo
assert node_1 == node_2
assert node_1 == node_3
assert node_2 == node_3
CASE scripts, in this repository and other downstream repositories on casework, define Namespace
variables to use the shorter 2nd and 3rd forms.
The recent release of UCO, 0.8.0, altered all IRIs (anchored in UCO proposal CP-107). This will necessitate revision of all code bases that had followed CASE's strategy.
One possible approach for mitigating required effort is to centralize the definitions of all CASE namespaces. This Pull Request does so by adding and adopting the module case_utils.namespace
.
Benefits
The following statement will guarantee the up-to-date namespace constants to be used for new CASE data are loaded for use by a script:
from case_utils.namespace import *
This reduces the code-review and code-upgrade burden caused by hard-coding IRI prefixes.
Risks
One plan (that has discussion anchored in CASE ticket ONT-64) for versioning CASE involves embedding the version string in IRIs. This would mean every release of CASE would alter all CASE IRIs in use. This risk is not specific to this Pull Request; rather, this Pull Request explores consequences and user experience of pursuing that versioning strategy.
The benefit of not needing to review hard-coded IRI prefixes relies on other mechanisms to detect breakages from IRIs changing. This repository and some others in the casework organization rely on pytest
and case_validate
to detect such breakage. Adopters should be encouraged to include generated-data validation in their CI.
Revisions of prefixes need to be synchronized between producers and consumers within an ecosystem. This can have an impact on data generated prior to a new release of this repository. Some of this risk within an ecosystem can be mitigated by Python package version-pinning.