Skip to content

In SearchOperations.searchForStream, 'maxResults' overwrites 'size' (pageSize) #3089

Closed
@hy2850

Description

@hy2850

Hi, I'm opening this issue to report something likely to be a bug.
When creating a SearchRequest for searchForStream method, maxResults parameter is overwriting size, while it seems like it shouldn't be.

Bug

I set both maxResult and Pageable.size (will refer to pageSize) fields for the query and passes it to SearchOperations.searchForStream.
When requestConverter.searchRequestinside the method creates a SearchRequest, maxResult overwrites pageSize.

In the below code, builder.size is called first with query.getPageable().getPageSize(), followed by another call with query.getMaxResults(), overwriting the result of the first call.

if (query.getPageable().isPaged()) {
builder //
.from((int) query.getPageable().getOffset()) //
.size(query.getPageable().getPageSize());
} else {
builder.from(0).size(INDEX_MAX_RESULT_WINDOW);
}
if (!isEmpty(query.getFields())) {
var fieldAndFormats = query.getFields().stream().map(field -> FieldAndFormat.of(b -> b.field(field))).toList();
builder.fields(fieldAndFormats);
}
if (!isEmpty(query.getStoredFields())) {
builder.storedFields(query.getStoredFields());
}
if (query.getIndicesOptions() != null) {
addIndicesOptions(builder, query.getIndicesOptions());
}
if (query.isLimiting()) {
builder.size(query.getMaxResults());
}

I was stuck at the case where,

  • 250,000 data available in the ES index
  • Want to fetch at most 150,000 documents in total (out of 250k), using SearchOperations.searchForStream (maxResult 150000)
  • Want to fetch 10,000 documents per page (per scroll search request), which fills almost all the response memory buffer (Pageable.ofSize(10000))

result

  • Expected : 15 search requests in total, a single search returning a page of 10k documents
  • Actual : a single search tries to fetch 150k documents, overflowing response memory buffer and causing below Exception

Caused by: org.springframework.dao.DataAccessResourceFailureException: entity content is too long [169386197] for the configured buffer limit [104857600]; nested exception is java.lang.RuntimeException: entity content is too long [169386197] for the configured buffer limit [104857600]


Possible solution

I think two parameters have totally separate roles, thus one shouldn't overwrite the other.

  • pageSize parameter should decide how many documents that I want ES to fetch per page,
  • while maxResult parameter decides how many documents I want to fetch in total during the whole stream operation.

👉 Therefore, the actual pageSize should be Min(maxResult, pageSize) if maxResult is set

Changes made : https://github.com/hy2850/spring-data-elasticsearch/commits/%233089-size-and-maxResults/


Further explanation with code for your understanding

SearchOperations.searchForStream returns a CloseableIterator that uses scroll API to search through a large set of data by pages.
Currently, maxResult is solely used to get maxCount for this method.

@Override
public <T> SearchHitsIterator<T> searchForStream(Query query, Class<T> clazz, IndexCoordinates index) {
Duration scrollTime = query.getScrollTime() != null ? query.getScrollTime() : Duration.ofMinutes(1);
long scrollTimeInMillis = scrollTime.toMillis();
// noinspection ConstantConditions
int maxCount = query.isLimiting() ? query.getMaxResults() : 0;
return StreamQueries.streamResults( //
maxCount, //
searchScrollStart(scrollTimeInMillis, query, clazz, index), //
scrollId -> searchScrollContinue(scrollId, scrollTimeInMillis, clazz, index), //
this::searchScrollClear);
}

The iterator keeps track of how many documents it has fetched so far, in currentCount instance variable.

@Override
public SearchHit<T> next() {
if (hasNext()) {
currentCount.incrementAndGet();
return currentScrollHits.next();
}
throw new NoSuchElementException();
}

When currentCount reachesmaxCount, the iterator is forced to return false for hasNext, even when there could be more documents available for searching.

@Override
public boolean hasNext() {
boolean hasNext = false;
if (!isClosed && continueScroll && (maxCount <= 0 || currentCount.get() < maxCount)) {
if (!currentScrollHits.hasNext()) {
SearchScrollHits<T> nextPage = continueScrollFunction.apply(scrollState.getScrollId());
currentScrollHits = nextPage.iterator();

So, I reached a conclusion that maxResult should only decide maxCount, not the pageSize, and this is a bug.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions