Skip to content

Should namespace constants be provided in a file namespace.py? #38

Closed
@ajnelson-nist

Description

@ajnelson-nist

Background

RDFLib usage frequently involves namespaces to handle IRI prefixes. E.g., to use the concept http://example.org/ontology/Foo, these are three ways to instantiate a rdflib.URIRef for that concept:

import rdflib

# 1.
node_1 = rdflib.URIRef("http://example.org/ontology/Foo")

# 2.
NS_EX = rdflib.Namespace("http://example.org/ontology/")
node_2 = NS_EX["Foo"]

# 3.
node_3 = NS_EX.Foo

assert node_1 == node_2
assert node_1 == node_3
assert node_2 == node_3

CASE scripts, in this repository and other downstream repositories on casework, define Namespace variables to use the shorter 2nd and 3rd forms.

The recent release of UCO, 0.8.0, altered all IRIs (anchored in UCO proposal CP-107). This will necessitate revision of all code bases that had followed CASE's strategy.

One possible approach for mitigating required effort is to centralize the definitions of all CASE namespaces. This Pull Request does so by adding and adopting the module case_utils.namespace.

Benefits

The following statement will guarantee the up-to-date namespace constants to be used for new CASE data are loaded for use by a script:

from case_utils.namespace import *

This reduces the code-review and code-upgrade burden caused by hard-coding IRI prefixes.

Risks

One plan (that has discussion anchored in CASE ticket ONT-64) for versioning CASE involves embedding the version string in IRIs. This would mean every release of CASE would alter all CASE IRIs in use. This risk is not specific to this Pull Request; rather, this Pull Request explores consequences and user experience of pursuing that versioning strategy.

The benefit of not needing to review hard-coded IRI prefixes relies on other mechanisms to detect breakages from IRIs changing. This repository and some others in the casework organization rely on pytest and case_validate to detect such breakage. Adopters should be encouraged to include generated-data validation in their CI.

Revisions of prefixes need to be synchronized between producers and consumers within an ecosystem. This can have an impact on data generated prior to a new release of this repository. Some of this risk within an ecosystem can be mitigated by Python package version-pinning.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions