Skip to content

Commit 63cb72b

Browse files
committed
Merge branch 'develop' into reduce_redundant_analysis
2 parents ce8da08 + 24bdcc7 commit 63cb72b

File tree

110 files changed

+44248
-365
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

110 files changed

+44248
-365
lines changed

.github/CODEOWNERS

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# This file lists the contributors responsible for the
2+
# repository content. They will also be automatically
3+
# asked to review any pull request made in this repository.
4+
5+
# Each line is a file pattern followed by one or more owners.
6+
# The sequence matters: later patterns take precedence.
7+
8+
# FILES OWNERS
9+
* @casework/maintainers-global
10+
* @casework/maintainers-case-python-utilities

.github/workflows/cicd.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@ on:
2323
release:
2424
types:
2525
- published
26+
schedule:
27+
- cron: '15 5 * * TUE'
2628

2729
jobs:
2830
build:
@@ -31,8 +33,8 @@ jobs:
3133
strategy:
3234
matrix:
3335
python-version:
34-
- '3.7'
35-
- '3.10'
36+
- '3.8'
37+
- '3.11'
3638

3739
steps:
3840
- uses: actions/checkout@v2

.pre-commit-config.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
repos:
22
- repo: https://github.com/psf/black
3-
rev: 22.3.0
3+
rev: 23.1.0
44
hooks:
55
- id: black
66
- repo: https://github.com/pycqa/flake8
7-
rev: 4.0.1
7+
rev: 6.0.0
88
hooks:
99
- id: flake8
1010
- repo: https://github.com/pycqa/isort
11-
rev: 5.10.1
11+
rev: 5.12.0
1212
hooks:
1313
- id: isort
1414
name: isort (python)

CONTRIBUTE.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,12 @@ pushd case_utils/ontology
2727
git add case-0.6.0.ttl # Assuming CASE 0.6.0 was just released.
2828
# and/or
2929
git add uco-0.8.0.ttl # Assuming UCO 0.8.0 was adopted in CASE 0.6.0.
30+
31+
git add ontology_and_version_iris.txt
3032
popd
3133
make check
3234
# Assuming `make check` passes:
33-
git commit -m "Build CASE 0.6.0 monolithic .ttl files" case_utils/ontology/case-0.6.0-subclasses.ttl case_utils/ontology/case-0.6.0.ttl
35+
git commit -m "Build CASE 0.6.0 monolithic .ttl files" case_utils/ontology/case-0.6.0-subclasses.ttl case_utils/ontology/case-0.6.0.ttl case_utils/ontology/ontology_and_version_iris.txt
3436
git commit -m "Update CASE ontology pointer to version 0.6.0" dependencies/CASE case_utils/ontology/version_info.py
3537
```
3638

@@ -43,4 +45,4 @@ pre-commit --version
4345
The `pre-commit` tool hooks into Git's commit machinery to run a set of linters and static analyzers over each change. To install `pre-commit` into Git's hooks, run:
4446
```bash
4547
pre-commit install
46-
```
48+
```

README.md

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,12 +55,17 @@ To produce the validation report as a machine-readable graph output, the `--form
5555
case_validate --format turtle input.json > result.ttl
5656
```
5757

58-
To use one or more supplementary ontology files, the `--ontology-graph` flag can be used, more than once if desired, to supplement the selected CASE version:
58+
To use one or more supplementary ontology or shape files, the `--ontology-graph` flag can be used, more than once if desired, to supplement the selected CASE version:
5959

6060
```bash
61-
case_validate --ontology-graph internal_ontology.ttl --ontology-graph experimental_shapes.ttl input.json
61+
case_validate \
62+
--ontology-graph internal_ontology.ttl \
63+
--ontology-graph experimental_shapes.ttl \
64+
input.json
6265
```
6366

67+
This tool uses the `--built-version` flag, described [below](#built-versions).
68+
6469
Other flags are reviewable with `case_validate --help`.
6570

6671

@@ -87,6 +92,8 @@ These commands can be used with any RDF files to run arbitrary SPARQL queries.
8792

8893
Note that prefixes used in the SPARQL queries do not need to be defined in the SPARQL query. Their mapping will be inherited from their first definition in the input graph files. However, input graphs are not required to agree on prefix mappings, so there is potential for confusion from input argument order mattering if two input graph files disagree on what a prefix maps to. If there is concern of ambiguity from inputs, a `PREFIX` statement should be included in the query, such as is shown in [this test query](tests/case_utils/case_sparql_select/subclass.sparql).
8994

95+
These tools use the `--built-version` flag, described [below](#built-versions).
96+
9097

9198
#### `case_sparql_construct`
9299

@@ -116,6 +123,15 @@ case_sparql_select output.md input.sparql input.json [input-2.json ...]
116123
This [module](case_utils/local_uuid.py) provides a wrapper UUID generator, `local_uuid()`. Its main purpose is making example data generate consistent identifiers, and intentionally includes mechanisms to make it difficult to activate this mode without awareness of the caller.
117124

118125

126+
### Built versions
127+
128+
Several tools in this package include a flag `--built-version`. This flag tailors the tool's behavior to a certain CASE ontology version; typically, this involves mixing the ontology graph into the data graph for certain necessary knowledge expansion for pattern matching (such as making queries aware of the OWL subclass hierarchy).
129+
130+
If not provided, the tool will assume a default value of the latest ontology version.
131+
132+
If the special value `none` is provided, none of the ontology builds this package ships will be included in the data graph. The `none` value supports use cases that are wholly independent of CASE, such as running a test in a specialized vocabulary; and also suports use cases where a non-released CASE version is meant to be used, such as a locally revised version of CASE where some concept revisions are being reviewed.
133+
134+
119135
## Development status
120136

121137
This repository follows [CASE community guidance on describing development status](https://caseontology.org/resources/software.html#development_status), by adherence to noted support requirements.

case_utils/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,6 @@
1111
#
1212
# We would appreciate acknowledgement if the software is used.
1313

14-
__version__ = "0.7.0"
14+
__version__ = "0.11.0"
1515

1616
from . import local_uuid # noqa: F401

case_utils/case_file/__init__.py

Lines changed: 71 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
This module creates a graph object that provides a basic UCO characterization of a single file. The gathered metadata is among the more "durable" file characteristics, i.e. characteristics that would remain consistent when transferring a file between locations.
1616
"""
1717

18-
__version__ = "0.4.0"
18+
__version__ = "0.5.0"
1919

2020
import argparse
2121
import datetime
@@ -27,7 +27,7 @@
2727

2828
import rdflib
2929

30-
import case_utils
30+
import case_utils.inherent_uuid
3131
from case_utils.namespace import (
3232
NS_RDF,
3333
NS_UCO_CORE,
@@ -39,6 +39,7 @@
3939

4040
DEFAULT_PREFIX = "http://example.org/kb/"
4141

42+
4243
# Shortcut syntax for defining an immutable named tuple is noted here:
4344
# https://docs.python.org/3/library/typing.html#typing.NamedTuple
4445
# via the "See also" box here: https://docs.python.org/3/library/collections.html#collections.namedtuple
@@ -48,6 +49,8 @@ class HashDict(typing.NamedTuple):
4849
sha1: str
4950
sha256: str
5051
sha512: str
52+
sha3_256: str
53+
sha3_512: str
5154

5255

5356
def create_file_node(
@@ -57,6 +60,9 @@ def create_file_node(
5760
node_prefix: str = DEFAULT_PREFIX,
5861
disable_hashes: bool = False,
5962
disable_mtime: bool = False,
63+
*args: typing.Any,
64+
use_deterministic_uuids: bool = False,
65+
**kwargs: typing.Any,
6066
) -> rdflib.URIRef:
6167
r"""
6268
This function characterizes the file at filepath.
@@ -67,7 +73,7 @@ def create_file_node(
6773
:param filepath: The path to the file to characterize. Can be relative or absolute.
6874
:type filepath: str
6975
70-
:param node_iri: The desired full IRI for the node. If absent, will make an IRI of the pattern ``ns_base + 'file-' + uuid4``
76+
:param node_iri: The desired full IRI for the node. If absent, will make an IRI of the pattern ``ns_base + 'File-' + uuid``
7177
:type node_iri: str
7278
7379
:param node_prefix: The base prefix to use if node_iri is not supplied.
@@ -85,7 +91,7 @@ def create_file_node(
8591
node_namespace = rdflib.Namespace(node_prefix)
8692

8793
if node_iri is None:
88-
node_slug = "file-" + case_utils.local_uuid.local_uuid()
94+
node_slug = "File-" + case_utils.local_uuid.local_uuid()
8995
node_iri = node_namespace[node_slug]
9096
n_file = rdflib.URIRef(node_iri)
9197
graph.add((n_file, NS_RDF.type, NS_UCO_OBSERVABLE.File))
@@ -94,7 +100,15 @@ def create_file_node(
94100
literal_basename = rdflib.Literal(basename)
95101

96102
file_stat = os.stat(filepath)
97-
n_file_facet = node_namespace["file-facet-" + case_utils.local_uuid.local_uuid()]
103+
104+
n_file_facet: rdflib.URIRef
105+
if use_deterministic_uuids:
106+
n_file_facet = case_utils.inherent_uuid.get_facet_uriref(
107+
n_file, NS_UCO_OBSERVABLE.FileFacet, namespace=node_namespace
108+
)
109+
else:
110+
n_file_facet = node_namespace["FileFacet-" + case_utils.local_uuid.local_uuid()]
111+
98112
graph.add(
99113
(
100114
n_file_facet,
@@ -121,9 +135,16 @@ def create_file_node(
121135
graph.add((n_file_facet, NS_UCO_OBSERVABLE.modifiedTime, literal_mtime))
122136

123137
if not disable_hashes:
124-
n_contentdata_facet = node_namespace[
125-
"content-data-facet-" + case_utils.local_uuid.local_uuid()
126-
]
138+
n_contentdata_facet: rdflib.URIRef
139+
if use_deterministic_uuids:
140+
n_contentdata_facet = case_utils.inherent_uuid.get_facet_uriref(
141+
n_file, NS_UCO_OBSERVABLE.ContentDataFacet, namespace=node_namespace
142+
)
143+
else:
144+
n_contentdata_facet = node_namespace[
145+
"ContentDataFacet-" + case_utils.local_uuid.local_uuid()
146+
]
147+
127148
graph.add((n_file, NS_UCO_CORE.hasFacet, n_contentdata_facet))
128149
graph.add(
129150
(n_contentdata_facet, NS_RDF.type, NS_UCO_OBSERVABLE.ContentDataFacet)
@@ -140,6 +161,8 @@ def create_file_node(
140161
sha1obj = hashlib.sha1()
141162
sha256obj = hashlib.sha256()
142163
sha512obj = hashlib.sha512()
164+
sha3_256obj = hashlib.sha3_256()
165+
sha3_512obj = hashlib.sha3_512()
143166
stashed_error = None
144167
byte_tally = 0
145168
with open(filepath, "rb") as in_fh:
@@ -158,6 +181,8 @@ def create_file_node(
158181
sha1obj.update(buf)
159182
sha256obj.update(buf)
160183
sha512obj.update(buf)
184+
sha3_256obj.update(buf)
185+
sha3_512obj.update(buf)
161186
if stashed_error is not None:
162187
raise stashed_error
163188
current_hashdict = HashDict(
@@ -166,6 +191,8 @@ def create_file_node(
166191
sha1obj.hexdigest(),
167192
sha256obj.hexdigest(),
168193
sha512obj.hexdigest(),
194+
sha3_256obj.hexdigest(),
195+
sha3_512obj.hexdigest(),
169196
)
170197
if last_hashdict == current_hashdict:
171198
successful_hashdict = current_hashdict
@@ -193,26 +220,48 @@ def create_file_node(
193220

194221
# Add confirmed hashes into graph.
195222
for key in successful_hashdict._fields:
196-
if key not in ("md5", "sha1", "sha256", "sha512"):
223+
if key not in ("md5", "sha1", "sha256", "sha512", "sha3_256", "sha3_512"):
197224
continue
198-
n_hash = node_namespace["hash-" + case_utils.local_uuid.local_uuid()]
225+
226+
l_hash_method: rdflib.Literal
227+
if key in ("sha3_256", "sha3_512"):
228+
l_hash_method = rdflib.Literal(
229+
key.replace("_", "-").upper(),
230+
datatype=NS_UCO_VOCABULARY.HashNameVocab,
231+
)
232+
else:
233+
l_hash_method = rdflib.Literal(
234+
key.upper(), datatype=NS_UCO_VOCABULARY.HashNameVocab
235+
)
236+
237+
hash_value: str = getattr(successful_hashdict, key)
238+
l_hash_value = rdflib.Literal(hash_value.upper(), datatype=NS_XSD.hexBinary)
239+
240+
hash_uuid: str
241+
if use_deterministic_uuids:
242+
hash_uuid = str(
243+
case_utils.inherent_uuid.hash_method_value_uuid(
244+
l_hash_method, l_hash_value
245+
)
246+
)
247+
else:
248+
hash_uuid = case_utils.local_uuid.local_uuid()
249+
n_hash = node_namespace["Hash-" + hash_uuid]
250+
199251
graph.add((n_contentdata_facet, NS_UCO_OBSERVABLE.hash, n_hash))
200252
graph.add((n_hash, NS_RDF.type, NS_UCO_TYPES.Hash))
201253
graph.add(
202254
(
203255
n_hash,
204256
NS_UCO_TYPES.hashMethod,
205-
rdflib.Literal(
206-
key.upper(), datatype=NS_UCO_VOCABULARY.HashNameVocab
207-
),
257+
l_hash_method,
208258
)
209259
)
210-
hash_value = getattr(successful_hashdict, key)
211260
graph.add(
212261
(
213262
n_hash,
214263
NS_UCO_TYPES.hashValue,
215-
rdflib.Literal(hash_value.upper(), datatype=NS_XSD.hexBinary),
264+
l_hash_value,
216265
)
217266
)
218267

@@ -225,6 +274,11 @@ def main() -> None:
225274
parser.add_argument("--debug", action="store_true")
226275
parser.add_argument("--disable-hashes", action="store_true")
227276
parser.add_argument("--disable-mtime", action="store_true")
277+
parser.add_argument(
278+
"--use-deterministic-uuids",
279+
action="store_true",
280+
help="Use UUIDs computed using the case_utils.inherent_uuid module.",
281+
)
228282
parser.add_argument(
229283
"--output-format", help="Override extension-based format guesser."
230284
)
@@ -257,14 +311,15 @@ def main() -> None:
257311
context_dictionary = {k: v for (k, v) in graph.namespace_manager.namespaces()}
258312
serialize_kwargs["context"] = context_dictionary
259313

260-
node_iri = NS_BASE["file-" + case_utils.local_uuid.local_uuid()]
314+
node_iri = NS_BASE["File-" + case_utils.local_uuid.local_uuid()]
261315
create_file_node(
262316
graph,
263317
args.in_file,
264318
node_iri=node_iri,
265319
node_prefix=args.base_prefix,
266320
disable_hashes=args.disable_hashes,
267321
disable_mtime=args.disable_mtime,
322+
use_deterministic_uuids=args.use_deterministic_uuids,
268323
)
269324

270325
graph.serialize(args.out_graph, **serialize_kwargs)

case_utils/case_sparql_construct/__init__.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
This script executes a SPARQL CONSTRUCT query, returning a graph of the generated triples.
1616
"""
1717

18-
__version__ = "0.2.3"
18+
__version__ = "0.2.5"
1919

2020
import argparse
2121
import logging
@@ -49,7 +49,7 @@ def main() -> None:
4949
"--built-version",
5050
choices=tuple(built_version_choices_list),
5151
default="case-" + CURRENT_CASE_VERSION,
52-
help="Ontology version to use to supplement query, such as for subclass querying. Does not require networking to use. Default is most recent CASE release.",
52+
help="Ontology version to use to supplement query, such as for subclass querying. Does not require networking to use. Default is most recent CASE release. Passing 'none' will mean no pre-built CASE ontology versions accompanying this tool will be included in the analysis.",
5353
)
5454
parser.add_argument(
5555
"--disallow-empty-results",
@@ -98,10 +98,11 @@ def main() -> None:
9898
construct_query_result = in_graph.query(construct_query_object)
9999
_logger.debug("type(construct_query_result) = %r." % type(construct_query_result))
100100
_logger.debug("len(construct_query_result) = %d." % len(construct_query_result))
101-
for (row_no, row) in enumerate(construct_query_result):
101+
for row_no, row in enumerate(construct_query_result):
102+
assert isinstance(row, tuple)
102103
if row_no == 0:
103104
_logger.debug("row[0] = %r." % (row,))
104-
out_graph.add(row)
105+
out_graph.add((row[0], row[1], row[2]))
105106

106107
output_format = None
107108
if args.output_format is None:

0 commit comments

Comments
 (0)