Skip to content

Commit 0106422

Browse files
Merge pull request #21 from casework/AC-210
Add validation command to CASE-Utilities-Python
2 parents 98ec00e + 913bd99 commit 0106422

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+17838
-10
lines changed

.gitmodules

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
[submodule "dependencies/CASE"]
2+
path = dependencies/CASE
3+
url = https://github.com/casework/CASE.git
14
[submodule "dependencies/CASE-Examples-QC"]
25
path = dependencies/CASE-Examples-QC
36
url = https://github.com/ajnelson-nist/CASE-Examples-QC.git

CONTRIBUTE.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Contributing to CASE-Utilities-Python
2+
3+
4+
## Deploying a new ontology version
5+
6+
1. After cloning this repository, ensure the CASE submodule is checked out. This can be done with either `git submodule init && git submodule update`, `make .git_submodule_init.done.log`, or `make check`.
7+
2. Update the CASE submodule pointer to the new tagged release.
8+
3. The version of CASE is also hard-coded in [`case_utils/ontology/version_info.py`](case_utils/ontology/version_info.py). Edit the variable `CURRENT_CASE_VERSION`.
9+
4. From the top source directory, run `make clean`. This guarantees a clean state of this repository as well as the ontology submodules.
10+
5. Still from the top source directory, run `make`.
11+
6. Any new `.ttl` files will be created under [`case_utils/ontology/`](case_utils/ontology/). Use `git add` to add each of them. (The patch-weight of these files could overshadow manual revisions, so it is fine to commit the built files after the manual changes are committed.)
12+
13+
Here is a sample sequence of shell commands to run the build:
14+
15+
```bash
16+
# (Starting from fresh `git clone`.)
17+
make check
18+
pushd dependencies/CASE
19+
git checkout master
20+
git pull
21+
popd
22+
git add dependencies/CASE
23+
# (Here, edits should be made to case_utils/ontology/version_info.py)
24+
make
25+
pushd case_utils/ontology
26+
git add case-0.6.0.ttl # Assuming CASE 0.6.0 was just released.
27+
# and/or
28+
git add uco-0.8.0.ttl # Assuming UCO 0.8.0 was adopted in CASE 0.6.0.
29+
popd
30+
make check
31+
# Assuming `make check` passes:
32+
git commit -m "Update CASE ontology pointer to version 0.6.0" dependencies/CASE case_utils/ontology/version_info.py
33+
git commit -m "Build CASE 0.6.0.ttl" case_utils/ontology/case-0.6.0.ttl
34+
```

Makefile

Lines changed: 47 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,13 @@ SHELL := /bin/bash
1515

1616
PYTHON3 ?= $(shell which python3.9 2>/dev/null || which python3.8 2>/dev/null || which python3.7 2>/dev/null || which python3.6 2>/dev/null || which python3)
1717

18-
all:
18+
case_version := $(shell $(PYTHON3) case_utils/ontology/version_info.py)
19+
ifeq ($(case_version),)
20+
$(error Unable to determine CASE version)
21+
endif
22+
23+
all: \
24+
.ontology.done.log
1925

2026
.PHONY: \
2127
download
@@ -35,10 +41,28 @@ all:
3541
$(MAKE) \
3642
--directory dependencies/CASE-Examples-QC/tests \
3743
ontology_vocabulary.txt
44+
test -r dependencies/CASE/ontology/master/case.ttl \
45+
|| (git submodule init dependencies/CASE && git submodule update dependencies/CASE)
46+
test -r dependencies/CASE/ontology/master/case.ttl
47+
$(MAKE) \
48+
--directory dependencies/CASE \
49+
.git_submodule_init.done.log \
50+
.lib.done.log
51+
touch $@
52+
53+
.ontology.done.log: \
54+
dependencies/CASE/ontology/master/case.ttl
55+
# Do not rebuild the current ontology file if it is already present. It is expected not to change once built.
56+
# touch -c: Do not create the file if it does not exist. This will convince the recursive make nothing needs to be done if the file is present.
57+
touch -c case_utils/ontology/case-$(case_version).ttl
58+
$(MAKE) \
59+
--directory case_utils/ontology
60+
# Confirm the current monolithic file is in place.
61+
test -r case_utils/ontology/case-$(case_version).ttl
3862
touch $@
3963

4064
check: \
41-
.git_submodule_init.done.log
65+
.ontology.done.log
4266
$(MAKE) \
4367
PYTHON3=$(PYTHON3) \
4468
--directory tests \
@@ -49,12 +73,32 @@ clean:
4973
--directory tests \
5074
clean
5175
@rm -f \
52-
.git_submodule_init.done.log
76+
.*.done.log
77+
@# 'clean' in the ontology directory should only happen when testing and building new ontology versions. Hence, it is not called from the top-level Makefile.
78+
@test ! -r dependencies/CASE/README.md \
79+
|| $(MAKE) \
80+
--directory dependencies/CASE \
81+
clean
82+
@# Restore CASE validation output files that do not affect CASE build process.
83+
@test ! -r dependencies/CASE/README.md \
84+
|| ( \
85+
cd dependencies/CASE \
86+
&& git checkout \
87+
-- \
88+
tests/examples \
89+
|| true \
90+
)
5391
@#Remove flag files that are normally set after deeper submodules and rdf-toolkit are downloaded.
5492
@rm -f \
5593
dependencies/CASE-Examples-QC/.git_submodule_init.done.log \
5694
dependencies/CASE-Examples-QC/.lib.done.log
5795

96+
# This recipe guarantees timestamp update order, and is otherwise intended to be a no-op.
97+
dependencies/CASE/ontology/master/case.ttl: \
98+
.git_submodule_init.done.log
99+
test -r $@
100+
touch $@
101+
58102
distclean: \
59103
clean
60104
@rm -rf \

README.md

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,33 @@ Installation is demonstrated in the `.venv.done.log` target of the [`tests/`](te
2020
## Usage
2121

2222

23+
### `case_validate`
24+
25+
This repository provides `case_validate` as an adaptation of the `pyshacl` command from [RDFLib's pySHACL](https://github.com/RDFLib/pySHACL). The command-line interface is adapted to run as though `pyshacl` were provided the full CASE ontology (and adopted full UCO ontology) as both a shapes and ontology graph. "Compiled" (or, "aggregated") CASE ontologies are in the [`case_utils/ontology/`](case_utils/ontology/) directory, and are installed with `pip`, so data validation can occur without requiring networking after this repository is installed.
26+
27+
To see a human-readable validation report of an instance-data file:
28+
29+
```bash
30+
case_validate input.json
31+
```
32+
33+
If `input.json` is not conformant, a report will be emitted, and `case_validate` will exit with status `1`. (This is a `pyshacl` behavior, where `0` and `1` report validation success. Status of >`1` is for other errors.)
34+
35+
To produce the validation report as a machine-readable graph output, the `--format` flag can be used to modify the output format:
36+
37+
```bash
38+
case_validate --format turtle input.json > result.ttl
39+
```
40+
41+
To use one or more supplementary ontology files, the `--ontology-graph` flag can be used, more than once if desired, to supplement the selected CASE version:
42+
43+
```bash
44+
case_validate --ontology-graph internal_ontology.ttl --ontology-graph experimental_shapes.ttl input.json
45+
```
46+
47+
Other flags are reviewable with `case_validate --help`.
48+
49+
2350
### `case_file`
2451

2552
To characterize a file, including hashes:
@@ -86,10 +113,9 @@ This project follows [SEMVER 2.0.0](https://semver.org/) where versions are decl
86113

87114
## Ontology versions supported
88115

89-
This repository supports the ontology versions that are linked as submodules in the [CASE Examples QC](https://github.com/ajnelson-nist/CASE-Examples-QC) repository. Currently, the ontology versions are:
116+
This repository supports the CASE ontology version that is linked as a submodule [here](dependencies/CASE). The CASE version is encoded as a variable (and checked in unit tests) in [`case_utils/ontology/version_info.py`](case_utils/ontology/version_info.py), and used throughout this code base, as `CURRENT_CASE_VERSION`.
90117

91-
* CASE - 0.4.0
92-
* UCO - 0.6.0
118+
For instructions on how to update the CASE version for an ontology release, see [`CONTRIBUTE.md`](CONTRIBUTE.md).
93119

94120

95121
## Repository locations

case_utils/case_validate/__init__.py

Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
#!/usr/bin/env python3
2+
3+
# This software was developed at the National Institute of Standards
4+
# and Technology by employees of the Federal Government in the course
5+
# of their official duties. Pursuant to title 17 Section 105 of the
6+
# United States Code this software is not subject to copyright
7+
# protection and is in the public domain. NIST assumes no
8+
# responsibility whatsoever for its use by other parties, and makes
9+
# no guarantees, expressed or implied, about its quality,
10+
# reliability, or any other characteristic.
11+
#
12+
# We would appreciate acknowledgement if the software is used.
13+
14+
"""
15+
This script provides a wrapper to the pySHACL command line tool,
16+
available here:
17+
https://github.com/RDFLib/pySHACL
18+
19+
Portions of the pySHACL command line interface are preserved and passed
20+
through to the underlying pySHACL validation functionality.
21+
22+
Other portions of the pySHACL command line interface are adapted to
23+
CASE, specifically to support CASE and UCO as ontologies that store
24+
subclass hierarchy and node shapes together (rather than as separate
25+
ontology and shape graphs). More specifically to CASE, if no particular
26+
ontology or shapes graph is requested, the most recent version of CASE
27+
will be used. (That most recent version is shipped with this package as
28+
a monolithic file; see case_utils.ontology if interested in further
29+
details.)
30+
"""
31+
32+
__version__ = "0.1.0"
33+
34+
import argparse
35+
import importlib.resources
36+
import logging
37+
import os
38+
import pathlib
39+
import sys
40+
import typing
41+
42+
import rdflib.util # type: ignore
43+
import pyshacl # type: ignore
44+
45+
import case_utils.ontology
46+
47+
from case_utils.ontology.version_info import *
48+
49+
_logger = logging.getLogger(os.path.basename(__file__))
50+
51+
def main() -> None:
52+
parser = argparse.ArgumentParser(description="CASE wrapper to pySHACL command line tool.")
53+
54+
# Configure debug logging before running parse_args, because there
55+
# could be an error raised before the construction of the argument
56+
# parser.
57+
logging.basicConfig(level=logging.DEBUG if ("--debug" in sys.argv or "-d" in sys.argv) else logging.INFO)
58+
59+
case_version_choices_list = ["none", "case-" + CURRENT_CASE_VERSION]
60+
61+
# Add arguments specific to case_validate.
62+
parser.add_argument(
63+
'-d',
64+
'--debug',
65+
action='store_true',
66+
help='Output additional runtime messages.'
67+
)
68+
parser.add_argument(
69+
"--built-version",
70+
choices=tuple(case_version_choices_list),
71+
default="case-"+CURRENT_CASE_VERSION,
72+
help="Monolithic aggregation of CASE ontology files at certain versions. Does not require networking to use. Default is most recent CASE release."
73+
)
74+
parser.add_argument(
75+
"--ontology-graph",
76+
action="append",
77+
help="Combined ontology (i.e. subclass hierarchy) and shapes (SHACL) file, in any format accepted by rdflib recognized by file extension (e.g. .ttl). Will supplement ontology selected by --built-version. Can be given multiple times."
78+
)
79+
80+
# Inherit arguments from pyshacl.
81+
parser.add_argument(
82+
'--abort',
83+
action='store_true',
84+
help='(As with pyshacl CLI) Abort on first invalid data.'
85+
)
86+
parser.add_argument(
87+
'-w',
88+
'--allow-warnings',
89+
action='store_true',
90+
help='(As with pyshacl CLI) Shapes marked with severity of Warning or Info will not cause result to be invalid.',
91+
)
92+
parser.add_argument(
93+
"-f",
94+
"--format",
95+
choices=('human', 'turtle', 'xml', 'json-ld', 'nt', 'n3'),
96+
default='human',
97+
help="(ALMOST as with pyshacl CLI) Choose an output format. Default is \"human\". Difference: 'table' not provided."
98+
)
99+
parser.add_argument(
100+
'-im',
101+
'--imports',
102+
action='store_true',
103+
help='(As with pyshacl CLI) Allow import of sub-graphs defined in statements with owl:imports.',
104+
)
105+
parser.add_argument(
106+
'-i',
107+
'--inference',
108+
choices=('none', 'rdfs', 'owlrl', 'both'),
109+
default='none',
110+
help="(As with pyshacl CLI) Choose a type of inferencing to run against the Data Graph before validating. Default is \"none\".",
111+
)
112+
parser.add_argument(
113+
'-o',
114+
'--output',
115+
dest='output',
116+
nargs='?',
117+
type=argparse.FileType('x'),
118+
help="(ALMOST as with pyshacl CLI) Send output to a file. If absent, output will be written to stdout. Difference: If specified, file is expected not to exist. Clarification: Does NOT influence --format flag's default value of \"human\". (I.e., any machine-readable serialization format must be specified with --format.)",
119+
default=sys.stdout,
120+
)
121+
122+
parser.add_argument("in_graph")
123+
124+
args = parser.parse_args()
125+
126+
data_graph = rdflib.Graph()
127+
data_graph.parse(args.in_graph)
128+
129+
ontology_graph = rdflib.Graph()
130+
if args.built_version != "none":
131+
ttl_filename = args.built_version + ".ttl"
132+
_logger.debug("ttl_filename = %r.", ttl_filename)
133+
ttl_data = importlib.resources.read_text(case_utils.ontology, ttl_filename)
134+
ontology_graph.parse(data=ttl_data, format="turtle")
135+
if args.ontology_graph:
136+
for arg_ontology_graph in args.ontology_graph:
137+
_logger.debug("arg_ontology_graph = %r.", arg_ontology_graph)
138+
ontology_graph.parse(arg_ontology_graph)
139+
140+
# Determine output format.
141+
# pySHACL's determination of output formatting is handled solely
142+
# through the -f flag. Other CASE CLI tools handle format
143+
# determination by output file extension. case_validate will defer
144+
# to pySHACL behavior, as other CASE tools don't (at the time of
145+
# this writing) have the value "human" as an output format.
146+
validator_kwargs : typing.Dict[str, str] = dict()
147+
if args.format != "human":
148+
validator_kwargs['serialize_report_graph'] = args.format
149+
150+
validate_result : typing.Tuple[
151+
bool,
152+
typing.Union[Exception, bytes, str, rdflib.Graph],
153+
str
154+
]
155+
validate_result = pyshacl.validate(
156+
data_graph,
157+
shacl_graph=ontology_graph,
158+
ont_graph=ontology_graph,
159+
inference=args.inference,
160+
abort_on_first=args.abort,
161+
allow_warnings=True if args.allow_warnings else False,
162+
debug=True if args.debug else False,
163+
do_owl_imports=True if args.imports else False,
164+
**validator_kwargs
165+
)
166+
167+
# Relieve RAM of the data graph after validation has run.
168+
del data_graph
169+
170+
conforms = validate_result[0]
171+
validation_graph = validate_result[1]
172+
validation_text = validate_result[2]
173+
174+
# NOTE: The output logistics code is adapted from pySHACL's file
175+
# pyshacl/cli.py. This section should be monitored for code drift.
176+
if args.format == "human":
177+
args.output.write(validation_text)
178+
else:
179+
if isinstance(validation_graph, rdflib.Graph):
180+
raise NotImplementedError("rdflib.Graph expected not to be created from --format value %r." % args.format)
181+
elif isinstance(validation_graph, bytes):
182+
args.output.write(validation_graph.decode("utf-8"))
183+
elif isinstance(validation_graph, str):
184+
args.output.write(validation_graph)
185+
else:
186+
raise NotImplementedError("Unexpected result type returned from validate: %r." % type(validation_graph))
187+
188+
sys.exit(0 if conforms else 1)
189+
190+
if __name__ == "__main__":
191+
main()

0 commit comments

Comments
 (0)