Skip to content

Retrieve Offsets of inner_hits #2521

Closed
@JKatzwinkel

Description

@JKatzwinkel
  • Elasticsearch: 8.6.2
  • spring-data-elasticsearch: 5.1.0-M3

Hi, thanks again for the excellent work, I enjoy it very much!

I'm having an issue while trying to obtain the correct offset values from inner_hits of a nested query via SearchHit#getNestedMetaData. Even though Elasticsearch responds with the expected offsets when I run my queries directly against it, the #getOffset() return values from inner hits via #getInnerHits() are always the same, regardless of how many inner hits are returned within an individual top-level hit.

Allow me to illustrate this based on your integration test(s). They run a (nested) nested query and test the offsets of the inner hits, of which there are 1 on each nesting level:

		Inhabitant john = new Inhabitant("John", "Smith");
		Inhabitant carla = new Inhabitant("Carla", "Miller");
		House cornerHouse = new House("Round the corner", "7", Arrays.asList(john, carla));
		City metropole = new City("Metropole", Arrays.asList(cornerHouse));

		Inhabitant jack = new Inhabitant("Jack", "Wayne");
		Inhabitant emmy = new Inhabitant("Emmy", "Stone");
		House mainStreet = new House("Main Street", "42", Arrays.asList(jack, emmy));
		City village = new City("Village", Arrays.asList(mainStreet));

		operations.save(Arrays.asList(metropole, village));
	}

	@Test
	@Order(java.lang.Integer.MAX_VALUE)
	void cleanup() {
		operations.indexOps(IndexCoordinates.of(indexNameProvider.getPrefix() + "*")).delete();
	}

	@Test
	void shouldReturnInnerHits() {

		Query query = buildQueryForInnerHits("inner_hit_name", "hou-ses.in-habi-tants", "hou-ses.in-habi-tants.first-name",
				"Carla");

		SoftAssertions softly = new SoftAssertions();
		SearchHits<City> searchHits = operations.search(query, City.class);

		softly.assertThat(searchHits.getTotalHits()).isEqualTo(1);

		SearchHit<City> searchHit = searchHits.getSearchHit(0);
		softly.assertThat(searchHit.getInnerHits()).hasSize(1);

		SearchHits<?> innerHits = searchHit.getInnerHits("inner_hit_name");
		softly.assertThat(innerHits).hasSize(1);

		SearchHit<?> innerHit = innerHits.getSearchHit(0);
		Object content = innerHit.getContent();
		assertThat(content).isInstanceOf(Inhabitant.class);
		Inhabitant inhabitant = (Inhabitant) content;
		softly.assertThat(inhabitant.getFirstName()).isEqualTo("Carla");
		softly.assertThat(inhabitant.getLastName()).isEqualTo("Miller");

		NestedMetaData nestedMetaData = innerHit.getNestedMetaData();
		softly.assertThat(nestedMetaData.getField()).isEqualTo("houses");
		softly.assertThat(nestedMetaData.getOffset()).isEqualTo(0); // <-- this seems right
		softly.assertThat(nestedMetaData.getChild().getField()).isEqualTo("inhabitants");
		softly.assertThat(nestedMetaData.getChild().getOffset()).isEqualTo(1);  // this makes sense too

		softly.assertAll();

However ☝, imagine they would look like this, now finding 2 inner hits that (as far as I understand) should have different offsets:

		Inhabitant john = new Inhabitant("John", "Smith");
		Inhabitant carla1 = new Inhabitant("Carla", "Miller");
		Inhabitant carla2 = new Inhabitant("Carla", "Nguyen");  // <-- second match/inner hit!!
		House cornerHouse = new House("Round the corner", "7", Arrays.asList(john, carla1, carla2));
		City metropole = new City("Metropole", Arrays.asList(cornerHouse));

		Inhabitant jack = new Inhabitant("Jack", "Wayne");
		Inhabitant emmy = new Inhabitant("Emmy", "Stone");
		House mainStreet = new House("Main Street", "42", Arrays.asList(jack, emmy));
		City village = new City("Village", Arrays.asList(mainStreet));

		operations.save(Arrays.asList(metropole, village));
	}

	@Test
	@Order(java.lang.Integer.MAX_VALUE)
	void cleanup() {
		operations.indexOps(IndexCoordinates.of(indexNameProvider.getPrefix() + "*")).delete();
	}

	private static void testInnerHit(
		SoftAssertions softly, SearchHit<?> innerHit,
		String firstName, String lastName,
		int nestedOffsetLvl1, int nestedOffsetLvl2
	) {
		Object content = innerHit.getContent();
		assertThat(content).isInstanceOf(Inhabitant.class);
		Inhabitant inhabitant = (Inhabitant) content;
		softly.assertThat(inhabitant.getFirstName()).isEqualTo(firstName);
		softly.assertThat(inhabitant.getLastName()).isEqualTo(lastName);

		NestedMetaData nestedMetaData = innerHit.getNestedMetaData();
		softly.assertThat(nestedMetaData.getField()).isEqualTo("houses");
		softly.assertThat(nestedMetaData.getOffset()).isEqualTo(nestedOffsetLvl1);
		softly.assertThat(nestedMetaData.getChild().getField()).isEqualTo("inhabitants");
		softly.assertThat(nestedMetaData.getChild().getOffset()).isEqualTo(nestedOffsetLvl2);
	}

	@Test
	void shouldReturnInnerHits() {

		Query query = buildQueryForInnerHits("inner_hit_name", "hou-ses.in-habi-tants", "hou-ses.in-habi-tants.first-name",
				"Carla");

		SoftAssertions softly = new SoftAssertions();
		SearchHits<City> searchHits = operations.search(query, City.class);

		softly.assertThat(searchHits.getTotalHits()).isEqualTo(1);

		SearchHit<City> searchHit = searchHits.getSearchHit(0);
		softly.assertThat(searchHit.getInnerHits()).hasSize(1);

		SearchHits<?> innerHits = searchHit.getInnerHits("inner_hit_name");
		softly.assertThat(innerHits).hasSize(2);

		testInnerHit(softly, innerHits.getSearchHit(0), "Carla", "Miller", 0, 1);
		testInnerHit(softly, innerHits.getSearchHit(1), "Carla", "Nguyen", 0, 2); // <-- the important bit!

		softly.assertAll();
	}

❌ Now the tests (./mvnw clean verify) fail because all of the inner hits have the same offset (1). This is despite of Elasticsearch itself returning the offsets I expect (1 and 2 in this case). I might be doing something wrong, but on the other hand the implementation responsible for mapping the inner hit offsets does contain something that confuses me (first and last line):

			NestedMetaData nestedMetaData = searchHits.getSearchHit(0).getContent().getNestedMetaData();
			ElasticsearchPersistentEntityWithNestedMetaData persistentEntityWithNestedMetaData = getPersistentEntity(
					mappingContext.getPersistentEntity(type), nestedMetaData);

			if (persistentEntityWithNestedMetaData.entity != null) {
				List<SearchHit<Object>> convertedSearchHits = new ArrayList<>();
				Class<?> targetType = persistentEntityWithNestedMetaData.entity.getType();

				// convert the list of SearchHit<SearchDocument> to list of SearchHit<Object>
				searchHits.getSearchHits().forEach(searchHit -> {
					SearchDocument searchDocument = searchHit.getContent();

					Object targetObject = converter.read(targetType, searchDocument);
					convertedSearchHits.add(new SearchHit<>(searchDocument.getIndex(), //
							searchDocument.getId(), //
							searchDocument.getRouting(), //
							searchDocument.getScore(), //
							searchDocument.getSortValues(), //
							searchDocument.getHighlightFields(), //
							searchHit.getInnerHits(), //
							persistentEntityWithNestedMetaData.nestedMetaData, // <-- this

So for each inner hit, the same nested matadata are being used (persistendEntityWithNestedMetaData.nestedMetaData), even though each inner inner hit should have its own metadata (_nested object with field and offset inside) in the Elasticsearch response.
There is probably a good reason for it to be implemented like this, but this seems to cause the unexpected behaviour I described above. I actually managed to fix this very easily by mapping the actual metadata of the actual inner hits respectively:

					convertedSearchHits.add(new SearchHit<>(searchDocument.getIndex(), //
							searchDocument.getId(), //
							searchDocument.getRouting(), //
							searchDocument.getScore(), //
							searchDocument.getSortValues(), //
							searchDocument.getHighlightFields(), //
							searchHit.getInnerHits(), //
							getPersistentEntity(
								mappingContext.getPersistentEntity(type),
								searchHit.getContent().getNestedMetaData()  // <-- 👀
							).nestedMetaData,
							searchHit.getExplanation(), //
							searchHit.getMatchedQueries(), //
							targetObject));
				});

✅ After this change, the tests pass again.

Am I right to assume that the way in which I amended your integration test(s) is how you are supposed to retrieve the inner hits offsets, or am I doing something wrong?

Thanks again and happy Easter!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions