Description
Hi, I'm opening this issue to report something likely to be a bug.
When creating a SearchRequest
for searchForStream
method, maxResults
parameter is overwriting size
, while it seems like it shouldn't be.
Bug
I set both maxResult
and Pageable.size
(will refer to pageSize
) fields for the query and passes it to SearchOperations.searchForStream.
When requestConverter.searchRequest
inside the method creates a SearchRequest
, maxResult
overwrites pageSize
.
In the below code, builder.size
is called first with query.getPageable().getPageSize()
, followed by another call with query.getMaxResults()
, overwriting the result of the first call.
I was stuck at the case where,
- 250,000 data available in the ES index
- Want to fetch at most 150,000 documents in total (out of 250k), using
SearchOperations.searchForStream
(maxResult 150000
) - Want to fetch 10,000 documents per page (per scroll search request), which fills almost all the response memory buffer (
Pageable.ofSize(10000)
)
result
- Expected : 15 search requests in total, a single search returning a page of 10k documents
- Actual : a single search tries to fetch 150k documents, overflowing response memory buffer and causing below Exception
Caused by: org.springframework.dao.DataAccessResourceFailureException: entity content is too long [169386197] for the configured buffer limit [104857600]; nested exception is java.lang.RuntimeException: entity content is too long [169386197] for the configured buffer limit [104857600]
Possible solution
I think two parameters have totally separate roles, thus one shouldn't overwrite the other.
pageSize
parameter should decide how many documents that I want ES to fetch per page,- while
maxResult
parameter decides how many documents I want to fetch in total during the whole stream operation.
👉 Therefore, the actual pageSize should be Min(maxResult, pageSize)
if maxResult
is set
Changes made : https://github.com/hy2850/spring-data-elasticsearch/commits/%233089-size-and-maxResults/
Further explanation with code for your understanding
SearchOperations.searchForStream
returns a CloseableIterator that uses scroll API to search through a large set of data by pages.
Currently, maxResult
is solely used to get maxCount
for this method.
The iterator keeps track of how many documents it has fetched so far, in currentCount
instance variable.
When currentCount
reachesmaxCount
, the iterator is forced to return false for hasNext
, even when there could be more documents available for searching.
So, I reached a conclusion that maxResult
should only decide maxCount
, not the pageSize
, and this is a bug.