Skip to content

use concatenation of id and collection for elasticsearch _id value #58

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 16, 2022

Conversation

philvarner
Copy link
Collaborator

@philvarner philvarner commented Mar 16, 2022

Related Issue(s):

Description:

  1. Uses a concatenation of the Item id and collection values as the ES document _id. This ensures that only one Item may have the unique (item id, collection id) tuple.
  2. Transaction Extension delete operation no longer looks up the item first, since delete will return a 404 if it doesn't exist already

PR Checklist:

  • Code is formatted and linted (run pre-commit run --all-files)
  • Tests pass (run make test)
  • Documentation has been updated to reflect changes, if applicable, and docs build successfully (run make docs)
  • Changes are added to the changelog

@@ -25,7 +25,7 @@ docker-shell:

.PHONY: test
test:
$(run_es) /bin/bash -c 'export && ./scripts/wait-for-it-es.sh elasticsearch:9200 && cd /app/stac_fastapi/elasticsearch/tests/ && pytest'
-$(run_es) /bin/bash -c 'export && ./scripts/wait-for-it-es.sh elasticsearch:9200 && cd /app/stac_fastapi/elasticsearch/tests/ && pytest'
Copy link
Collaborator Author

@philvarner philvarner Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the - at the beginning will allow the execution of docker-compose down even if there are failing tests.

actions = [
{
"_index": ITEMS_INDEX,
"_id": mk_item_id(item["id"], item["collection"]),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicitly set the document id for bulk insert now

Copy link
Collaborator

@jonhealy1 jonhealy1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@@ -33,6 +33,11 @@
COLLECTIONS_INDEX = "stac_collections"


def mk_item_id(item_id: str, collection_id: str):
"""Make the Elasticsearch document _id value from the Item id and collection."""
return f"{item_id}|{collection_id}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea!


es_transactions.delete_collection(
collection_data["id"], request=MockStarletteRequest
)
es_transactions.delete_item(
item_data["id"], coll["id"], request=MockStarletteRequest
item_data["id"], item_data["collection"], request=MockStarletteRequest
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test started failing because the wrong collection value was passed (notice it's passing the Item ID again), but the collection was previously ignored in delete, whereas now it's used.

@@ -153,7 +153,10 @@ def app_client(api_client, load_test_data):
try:
client.create_collection(coll, request=MockStarletteRequest)
except ConflictError:
pass
try:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests do not clean up well, especially when any of them fail. This is a stop-gap until that's better handled.

@@ -143,6 +143,8 @@ def test_update_item_missing_collection(app_client, load_test_data):
def test_update_item_geometry(app_client, load_test_data):
test_item = load_test_data("test_item.json")

test_item["id"] = "update_test_item_1"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, another stop-gap. The index should probably be cleared before each run, and each item should be given a random id.

@philvarner philvarner merged commit 7e83c51 into main Mar 16, 2022
@philvarner philvarner deleted the pv/use_unique_es_id branch March 16, 2022 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants