Skip to content

Add inherence UUID functions #112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 9, 2023
Merged

Conversation

ajnelson-nist
Copy link
Member

This PR incorporates two others:

This should not be merged before a chance for discussion at the Adoption Committee meeting, 2023-05-09.

…f class's IRI

While drafting hand-written example data, it had proved beneficial to
some drafters (myself included) to disambiguate `owl:NamedIndividual`s
from `owl:Class`es by spelling the class differently in the IRI.  Taking
`uco-observable:FileFacet` as an example, it was originally frequently
written as `FileFacet` when referring to the class, and `file-facet-...`
when referring to an individual.

Unfortunately, trying to carry this pattern forward is likely to create
a technological burden.  Camel casing can't always be assumed to apply
straightforwardly, and would cause special-case logic to be needed.
See e.g.:

* `uco-observable:WifiAddressFacet` that would split on capital letters
  to `kb:wifi-address-facet`, which doesn't seem to be a problem;
* `uco-location:GPSCoordinatesFacet` would induce
  `kb:g-p-s-coordinates-facet`, which seems far less obviously
  acceptable;
* `uco-observable:HTTPConnectionFacet` splitting to
  `kb:h-t-t-p-connection-facet` may be the last convincing we need.

Rather than invest in preserving the lowercased, hyphenated suffix
scheme, this patch removes the question and now has individuals use the
last path-segment of the class's IRI.

A follow-on patch will regenerate Make-managed files.

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
This patch enables review of Python code inlined in the `case-utils`
modules' docstrings.

An initial docstring test is also included, because `pytest` reports an
error if it is called and no tests are found.

References:
* https://docs.pytest.org/en/7.1.x/how-to/doctest.html

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
… case_file

A follow-on patch will regenerate Make-managed files.

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
No effects were observed on Make-managed files.

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist
Copy link
Member Author

The summary of this PR is:

  • A new module is added for using deterministic, instead of random, UUIDs.
  • The module is expected to be used on an opt-in premise. By default, UUIDs generated by case-utils are currently, and will remain, UUIDv4 (non-deterministic).
  • The module handles making deterministic UUIDs for instances of uco-types:Hash and uco-types:Facet subclasses. The module documents what seeds the UUIDs, but in short, Hashes are wholly determined by their hashMethod and hashValue values; and Facet subclasses are seeded only by a UUID representation of the UcoObject in which the Facet inheres (i.e. what it's attached to by uco-core:hasFacet). ("UUID representation" is documented in the module.)
    • The primary benefit of deterministic Hashes is deduplication and correlation without comparing property-values. (E.g. a relational database could find a Hash by indexing the UUID of its IRI.)
    • The primary benefit of deterministic Facets is enforcing uniqueness per Facet class, even if all that is known about a UcoObject is its IRI. (In UCO, a UcoObject should not have two instances of the same Facet class; e.g. a File should not have two FileFacets. Ontology proposals are under draft to clarify and encode this.) This is a significant need in workflows that perform multi-tool processing on graph individuals. (So, this partially supports AC-162.)
  • The module is added to case_file, and functionality enabled with a new flag, --use-determistic-uuids.

@ajnelson-nist ajnelson-nist added this to the 0.11.0 milestone May 4, 2023
@ajnelson-nist ajnelson-nist changed the title Add inherence UUID namespaces Add inherence UUID functions May 4, 2023
@ajnelson-nist ajnelson-nist marked this pull request as ready for review May 9, 2023 14:18
@ajnelson-nist ajnelson-nist requested a review from a team as a code owner May 9, 2023 14:18
@kchason
Copy link
Member

kchason commented May 9, 2023

Discussed and approved in the 2023-05-09 Adoption Committee meeting

@kchason kchason merged commit ccf32f9 into develop May 9, 2023
ajnelson-nist added a commit to casework/CASE-Implementation-ExifTool that referenced this pull request May 11, 2023
A follow-on patch will regenerate Make-managed files.

References:
* casework/CASE-Utilities-Python#112

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Implementation-ExifTool that referenced this pull request May 11, 2023
References:
* casework/CASE-Utilities-Python#112

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist ajnelson-nist deleted the add_inherence_uuid_namespaces branch May 11, 2023 12:21
ajnelson-nist added a commit to casework/CASE-Implementation-PyPI-Exifread that referenced this pull request May 11, 2023
Rationale is documented in case-utils PR 112.

References:
* casework/CASE-Utilities-Python#112

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Implementation-PyPI-Exifread that referenced this pull request May 11, 2023
Rationale is documented in case-utils PR 112.

References:
* casework/CASE-Utilities-Python#112

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Implementation-PyPI-Exifread that referenced this pull request May 11, 2023
Rationale is documented in case-utils PR 112.

References:
* casework/CASE-Utilities-Python#112

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Implementation-PROV-O that referenced this pull request May 18, 2023
…nodes

This patch uses the inherence UUID functions from `case-utils` PR 112 to
replace the blank nodes generared with SPARQL Construct queries.  As
side effects of this migration, some bugs were fixed with generating
some associations, and inherence modeling assumptions are now specified
in code comments.

This patch also adds `prov:Start` and `prov:End` nodes to reify
`prov:Activity` (and `case-investigation:InvestigativeAction`) time
boundaries.  This will be a significant assistance in OWL-Time-based
visualization under development for `case-prov` PR 54.  Creating the
`prov:Start` and `prov:End` nodes as IRI-identified is also necessary
because of a bug observed in `rdf-toolkit`; see their Issue 52.

Since `case_prov_rdf` will now be able to generate non-blank nodes, it
has picked up two behaviors used in other projects importing
`case-utils`:

* The `--use-deterministic-uuids` flag has been added.
* The `CASE_DEMO_NONRANDOM_UUID_BASE` environment variable can now be
  used to make non-inherent deterministic UUIDs.

A follow-on patch will regenerate Make-managed files.

References:
* #54
* casework/CASE-Utilities-Python#112
* edmcouncil/rdf-toolkit#52

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Implementation-PROV-O that referenced this pull request May 18, 2023
…nodes

This patch uses the inherence UUID functions from `case-utils` PR 112 to
replace the blank nodes generated with SPARQL Construct queries.  As
side effects of this migration, some bugs were fixed with generating
some associations, and inherence modeling assumptions are now specified
in code comments.

This patch also adds `prov:Start` and `prov:End` nodes to reify
`prov:Activity` (and `case-investigation:InvestigativeAction`) time
boundaries.  This will be a significant assistance in OWL-Time-based
visualization under development for `case-prov` PR 54.  Creating the
`prov:Start` and `prov:End` nodes as IRI-identified is also necessary
because of a bug observed in `rdf-toolkit`; see their Issue 52.

Since `case_prov_rdf` will now be able to generate non-blank nodes, it
has picked up two behaviors used in other projects importing
`case-utils`:

* The `--use-deterministic-uuids` flag has been added.
* The `CASE_DEMO_NONRANDOM_UUID_BASE` environment variable can now be
  used to make non-inherent deterministic UUIDs.

A follow-on patch will regenerate Make-managed files.

References:
* #54
* casework/CASE-Utilities-Python#112
* edmcouncil/rdf-toolkit#52

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to ajnelson-nist/CASE-Examples-QC that referenced this pull request Dec 18, 2023
References:
* casework/CASE-Utilities-Python#112

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to ajnelson-nist/CASE-Examples-QC that referenced this pull request Dec 18, 2023
This is written in response to CASE website Issue 264.  It accommodates
UUIDv5s per the inherent UUID feature added in `case-utils`' PR 112.

This patch is known to not pass `make check` in the current state of the
examples.

References:
* casework/CASE-Utilities-Python#112
* casework/casework.github.io#264

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants