Skip to content

Release 0.3.0 #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 62 commits into from
Nov 10, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
d8be3d2
Merge pull request #14 from casework/release-0.2.1
ajnelson-nist Sep 7, 2021
ce43114
Bump rdflib dependency to 6.0.1, deprecating local guess_format
ajnelson-nist Sep 28, 2021
eeae6ac
Remove 'format=' parameter from graph parse() and serialize() calls
ajnelson-nist Sep 28, 2021
83fd6ed
Merge pull request #15 from casework/use_rdflib_6.0.1_features
balon Sep 28, 2021
d1ca249
Add mypy source review to current state
ajnelson-nist Oct 7, 2021
9a582d7
Designate case_utils as a typed package
ajnelson-nist Oct 7, 2021
f99bcea
Add minimal type signatures to test scripts
ajnelson-nist Oct 7, 2021
97616b4
Add minimal type review to case_utils
ajnelson-nist Oct 7, 2021
43ff130
Add type signatures to most of case_utils
ajnelson-nist Oct 7, 2021
8854561
Use warnings.warn instead of unused variable _logger
ajnelson-nist Oct 7, 2021
7fb54ef
Add type signatures to case_utils.case_file.create_file_node
ajnelson-nist Oct 7, 2021
42000c9
Add type signatures required by mypy --strict
ajnelson-nist Oct 7, 2021
4c68051
Merge pull request #16 from casework/AC-211
balon Oct 8, 2021
5edb610
Update string type from graph.serialize
ajnelson-nist Oct 27, 2021
b066b5f
Merge pull request #17 from casework/update_graph_serialize_type
ajnelson-nist Oct 27, 2021
c7439ac
Add wheel package to tests venv construction
ajnelson-nist Oct 27, 2021
b9afdaf
Merge pull request #18 from casework/AC-195
ajnelson-nist Oct 27, 2021
b5a04ae
Require rdflib >= 6.0.2
ajnelson-nist Oct 27, 2021
8025299
Remove tests around Python 3.6 detection
ajnelson-nist Oct 27, 2021
c8ddab1
Merge pull request #19 from casework/bump_rdflib
ajnelson-nist Oct 27, 2021
705dc77
Align test directories with /case_utils structure
ajnelson-nist Oct 27, 2021
444ae1f
Regenerate Make-managed files
ajnelson-nist Oct 27, 2021
3b2ba8e
Merge pull request #20 from casework/realign_test_directories
ajnelson-nist Oct 27, 2021
73800db
Track CASE ontology as submodule
ajnelson-nist Oct 28, 2021
78de18a
Use Java 8 for rdf-toolkit
ajnelson-nist Oct 29, 2021
98ec00e
Merge pull request #22 from casework/OC-164
ajnelson-nist Oct 29, 2021
f508280
Add recipes and instructions to build ontology
ajnelson-nist Oct 28, 2021
4e78ef8
Build case-0.5.0.ttl
ajnelson-nist Oct 28, 2021
e824307
Add tests to confirm hard-coded version information against ontology
ajnelson-nist Oct 28, 2021
4a42714
Merge branch 'OC-164' into AC-210-v3
ajnelson-nist Oct 29, 2021
e1135a7
Add case_validate
ajnelson-nist Oct 28, 2021
fb5cc0c
Generate Make-managed files
ajnelson-nist Oct 28, 2021
0708fc2
Validate output of case_file test
ajnelson-nist Oct 28, 2021
94b9607
Fix or-gated descent syntax
ajnelson-nist Nov 2, 2021
d803cbe
Add --output to case_validate, and document and test flag adaptation
ajnelson-nist Nov 2, 2021
6066e99
Generate Make-managed files
ajnelson-nist Nov 2, 2021
b89231a
Make JSON-LD blank nodes consistently generated
ajnelson-nist Nov 2, 2021
913bd99
Regenerate Make-managed output
ajnelson-nist Nov 2, 2021
0106422
Merge pull request #21 from casework/AC-210
ajnelson-nist Nov 2, 2021
844eeab
Bump CASE-Examples-QC pointer
ajnelson-nist Nov 10, 2021
12b26b8
Clean whitespace
ajnelson-nist Nov 10, 2021
d360c7b
Sort imports
ajnelson-nist Nov 10, 2021
b393a37
Use one rdf-toolkit.jar
ajnelson-nist Nov 10, 2021
edeff21
Reduce Python 3 detection
ajnelson-nist Nov 10, 2021
c42df16
Merge pull request #23 from casework/cleanup
ajnelson-nist Nov 10, 2021
adff60e
Align variable name
ajnelson-nist Nov 10, 2021
e0ca0ce
Add multi-data-graph handling to case_validate
ajnelson-nist Nov 10, 2021
822d2a1
Test case_validate multi-data-graph handling
ajnelson-nist Nov 10, 2021
3901840
Merge pull request #24 from casework/AC-210
ajnelson-nist Nov 10, 2021
8ca2794
Adjust formatting for existing arguments
ajnelson-nist Nov 10, 2021
9af628a
Configure logging before argument parsing
ajnelson-nist Nov 10, 2021
16a5a2b
Build subclass hierarchy as additional static resource
ajnelson-nist Nov 10, 2021
f4ec175
Build case-0.5.0-subclasses.ttl
ajnelson-nist Nov 10, 2021
6eb2be4
Load subclass hierarchy when subClassOf used in query
ajnelson-nist Nov 10, 2021
6ae59f3
Move W3C test data files to "w3-" prefixed files
ajnelson-nist Nov 10, 2021
4cf0967
Add subclass tests for case_sparql_* commands
ajnelson-nist Nov 10, 2021
8fe7461
Load subclass hierarchy to prepare for case_file implying ObservableO…
ajnelson-nist Nov 10, 2021
fcfdd4b
Use ObservableObject subclass File for case_file
ajnelson-nist Nov 10, 2021
6f9458f
Regenerate Make-managed files
ajnelson-nist Nov 10, 2021
c39a9cd
Merge pull request #25 from casework/OC-65
ajnelson-nist Nov 10, 2021
fac290a
Bump versions
ajnelson-nist Nov 10, 2021
1acc5e2
Remove obviated remark about Python <3.7
ajnelson-nist Nov 10, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ jobs:

steps:
- uses: actions/checkout@v2
- uses: actions/setup-java@v2
with:
distribution: 'adopt'
java-version: '8'
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
Expand Down
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
[submodule "dependencies/CASE"]
path = dependencies/CASE
url = https://github.com/casework/CASE.git
[submodule "dependencies/CASE-Examples-QC"]
path = dependencies/CASE-Examples-QC
url = https://github.com/ajnelson-nist/CASE-Examples-QC.git
34 changes: 34 additions & 0 deletions CONTRIBUTE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Contributing to CASE-Utilities-Python


## Deploying a new ontology version

1. After cloning this repository, ensure the CASE submodule is checked out. This can be done with either `git submodule init && git submodule update`, `make .git_submodule_init.done.log`, or `make check`.
2. Update the CASE submodule pointer to the new tagged release.
3. The version of CASE is also hard-coded in [`case_utils/ontology/version_info.py`](case_utils/ontology/version_info.py). Edit the variable `CURRENT_CASE_VERSION`.
4. From the top source directory, run `make clean`. This guarantees a clean state of this repository as well as the ontology submodules.
5. Still from the top source directory, run `make`.
6. Any new `.ttl` files will be created under [`case_utils/ontology/`](case_utils/ontology/). Use `git add` to add each of them. (The patch-weight of these files could overshadow manual revisions, so it is fine to commit the built files after the manual changes are committed.)

Here is a sample sequence of shell commands to run the build:

```bash
# (Starting from fresh `git clone`.)
make check
pushd dependencies/CASE
git checkout master
git pull
popd
git add dependencies/CASE
# (Here, edits should be made to case_utils/ontology/version_info.py)
make
pushd case_utils/ontology
git add case-0.6.0.ttl # Assuming CASE 0.6.0 was just released.
# and/or
git add uco-0.8.0.ttl # Assuming UCO 0.8.0 was adopted in CASE 0.6.0.
popd
make check
# Assuming `make check` passes:
git commit -m "Update CASE ontology pointer to version 0.6.0" dependencies/CASE case_utils/ontology/version_info.py
git commit -m "Build CASE 0.6.0.ttl" case_utils/ontology/case-0.6.0.ttl
```
57 changes: 52 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,15 @@

SHELL := /bin/bash

PYTHON3 ?= $(shell which python3.9 2>/dev/null || which python3.8 2>/dev/null || which python3.7 2>/dev/null || which python3.6 2>/dev/null || which python3)
PYTHON3 ?= python3

all:
case_version := $(shell $(PYTHON3) case_utils/ontology/version_info.py)
ifeq ($(case_version),)
$(error Unable to determine CASE version)
endif

all: \
.ontology.done.log

.PHONY: \
download
Expand All @@ -31,14 +37,35 @@ all:
# Build an ontology terms list, which has a side effect of initiating further submodules.
$(MAKE) \
--directory dependencies/CASE-Examples-QC \
download
.git_submodule_init.done.log \
.venv.done.log
$(MAKE) \
--directory dependencies/CASE-Examples-QC/tests \
ontology_vocabulary.txt
test -r dependencies/CASE/ontology/master/case.ttl \
|| (git submodule init dependencies/CASE && git submodule update dependencies/CASE)
test -r dependencies/CASE/ontology/master/case.ttl
$(MAKE) \
--directory dependencies/CASE \
.git_submodule_init.done.log \
.lib.done.log
touch $@

.ontology.done.log: \
dependencies/CASE/ontology/master/case.ttl
# Do not rebuild the current ontology file if it is already present. It is expected not to change once built.
# touch -c: Do not create the file if it does not exist. This will convince the recursive make nothing needs to be done if the file is present.
touch -c case_utils/ontology/case-$(case_version).ttl
touch -c case_utils/ontology/case-$(case_version)-subclasses.ttl
$(MAKE) \
--directory case_utils/ontology
# Confirm the current monolithic file is in place.
test -r case_utils/ontology/case-$(case_version).ttl
test -r case_utils/ontology/case-$(case_version)-subclasses.ttl
touch $@

check: \
.git_submodule_init.done.log
.ontology.done.log
$(MAKE) \
PYTHON3=$(PYTHON3) \
--directory tests \
Expand All @@ -49,12 +76,32 @@ clean:
--directory tests \
clean
@rm -f \
.git_submodule_init.done.log
.*.done.log
@# 'clean' in the ontology directory should only happen when testing and building new ontology versions. Hence, it is not called from the top-level Makefile.
@test ! -r dependencies/CASE/README.md \
|| $(MAKE) \
--directory dependencies/CASE \
clean
@# Restore CASE validation output files that do not affect CASE build process.
@test ! -r dependencies/CASE/README.md \
|| ( \
cd dependencies/CASE \
&& git checkout \
-- \
tests/examples \
|| true \
)
@#Remove flag files that are normally set after deeper submodules and rdf-toolkit are downloaded.
@rm -f \
dependencies/CASE-Examples-QC/.git_submodule_init.done.log \
dependencies/CASE-Examples-QC/.lib.done.log

# This recipe guarantees timestamp update order, and is otherwise intended to be a no-op.
dependencies/CASE/ontology/master/case.ttl: \
.git_submodule_init.done.log
test -r $@
touch $@

distclean: \
clean
@rm -rf \
Expand Down
36 changes: 31 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,33 @@ Installation is demonstrated in the `.venv.done.log` target of the [`tests/`](te
## Usage


### `case_validate`

This repository provides `case_validate` as an adaptation of the `pyshacl` command from [RDFLib's pySHACL](https://github.com/RDFLib/pySHACL). The command-line interface is adapted to run as though `pyshacl` were provided the full CASE ontology (and adopted full UCO ontology) as both a shapes and ontology graph. "Compiled" (or, "aggregated") CASE ontologies are in the [`case_utils/ontology/`](case_utils/ontology/) directory, and are installed with `pip`, so data validation can occur without requiring networking after this repository is installed.

To see a human-readable validation report of an instance-data file:

```bash
case_validate input.json [input-2.json ...]
```

If `input.json` is not conformant, a report will be emitted, and `case_validate` will exit with status `1`. (This is a `pyshacl` behavior, where `0` and `1` report validation success. Status of >`1` is for other errors.)

To produce the validation report as a machine-readable graph output, the `--format` flag can be used to modify the output format:

```bash
case_validate --format turtle input.json > result.ttl
```

To use one or more supplementary ontology files, the `--ontology-graph` flag can be used, more than once if desired, to supplement the selected CASE version:

```bash
case_validate --ontology-graph internal_ontology.ttl --ontology-graph experimental_shapes.ttl input.json
```

Other flags are reviewable with `case_validate --help`.


### `case_file`

To characterize a file, including hashes:
Expand All @@ -39,6 +66,8 @@ case_file --disable-hashes sample.txt.json sample.txt

Two commands are provided to generate output from a SPARQL query and one or more input graphs. Input graphs can be any graph, such as instance data or supplementary ontology files that supply custom class definitions or other external ontologies.

These commands can be used with any RDF files to run arbitrary SPARQL queries. They have one additional behavior tailored to CASE: If a path query is used for subclasses, the CASE subclass hierarchy will be loaded to supplement the input graph. An expected use case of this feature is subclasses of `ObservableObject`. For instance, if a data graph included an object with only the class `uco-observable:File` specified, the query `?x a/rdfs:subClassOf* uco-observable:ObservableObject` would match `?x` against that object.


#### `case_sparql_construct`

Expand All @@ -62,8 +91,6 @@ case_sparql_select output.html input.sparql input.json [input-2.json ...]
case_sparql_select output.md input.sparql input.json [input-2.json ...]
```

Note that `case_sparql_select` is not guaranteed to function with Pythons below version 3.7.


### `local_uuid`

Expand All @@ -86,10 +113,9 @@ This project follows [SEMVER 2.0.0](https://semver.org/) where versions are decl

## Ontology versions supported

This repository supports the ontology versions that are linked as submodules in the [CASE Examples QC](https://github.com/ajnelson-nist/CASE-Examples-QC) repository. Currently, the ontology versions are:
This repository supports the CASE ontology version that is linked as a submodule [here](dependencies/CASE). The CASE version is encoded as a variable (and checked in unit tests) in [`case_utils/ontology/version_info.py`](case_utils/ontology/version_info.py), and used throughout this code base, as `CURRENT_CASE_VERSION`.

* CASE - 0.4.0
* UCO - 0.6.0
For instructions on how to update the CASE version for an ontology release, see [`CONTRIBUTE.md`](CONTRIBUTE.md).


## Repository locations
Expand Down
38 changes: 11 additions & 27 deletions case_utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,35 +11,19 @@
#
# We would appreciate acknowledgement if the software is used.

__version__ = "0.2.1"
__version__ = "0.3.0"

import rdflib.util
import typing
import warnings

from . import local_uuid

def guess_format(fpath, fmap=None):
"""
This function is a wrapper around rdflib.util.guess_format(), adding that the .json extension should be recognized as JSON-LD.

:param fpath: File path.
:type fpath: string
import rdflib.util # type: ignore

:param fmap: Mapper dictionary; see rdflib.util.guess_format() for further description. Note that as in rdflib 5.0.0, supplying this argument overwrites, not augments, the suffix format map used by rdflib.
:type fmap: dict

:returns: RDF file format, fit for rdflib.Graph.parse() or .serialize(); or, None if file extension not mapped.
:rtype: string
"""

assert fmap is None or isinstance(fmap, dict), "Type check failed"
from . import local_uuid

if fmap is None:
updated_fmap = {key:rdflib.util.SUFFIX_FORMAT_MAP[key] for key in rdflib.util.SUFFIX_FORMAT_MAP}
if not "json" in updated_fmap:
updated_fmap["json"] = "json-ld"
if not "jsonld" in updated_fmap:
updated_fmap["jsonld"] = "json-ld"
else:
updated_fmap = {k:fmap[k] for k in fmap}
def guess_format(
fpath : str,
fmap : typing.Optional[typing.Dict[str, str]] = None
) -> typing.Optional[str]:
warnings.warn("The functionality in case_utils.guess_format is now upstream. Please revise your code to use rdflib.util.guess_format. The function arguments remain the same. case_utils.guess_format will be removed in case_utils 0.4.0.", DeprecationWarning)

return rdflib.util.guess_format(fpath, updated_fmap)
return rdflib.util.guess_format(fpath, fmap) # type: ignore
Loading