In SearchOperations.searchForStream, 'maxResults' overwrites 'size' (pageSize)

Hi, I'm opening this issue to report something likely to be a bug. 
When creating a `SearchRequest` for `searchForStream` method, `maxResults` parameter is overwriting `size`, while it seems like it shouldn't be.

## Bug
I set both `maxResult` and `Pageable.size` (will refer to `pageSize`) fields for the query and passes it to SearchOperations.searchForStream. 
When `requestConverter.searchRequest`inside the method creates a `SearchRequest`, `maxResult` overwrites `pageSize`.

In the below code, `builder.size` is called first with `query.getPageable().getPageSize()`, followed by another call with `query.getMaxResults()`, overwriting the result of the first call.
https://github.com/spring-projects/spring-data-elasticsearch/blob/0e5af905818ca852c377c0aa576e56516b713dac/src/main/java/org/springframework/data/elasticsearch/client/elc/RequestConverter.java#L1469-L1492

I was stuck at the case where,
- 250,000 data available in the ES index
- Want to fetch at most 150,000 documents in total (out of 250k), using `SearchOperations.searchForStream` (`maxResult 150000`)
- Want to fetch 10,000 documents per page (per scroll search request), which fills almost all the response memory buffer (`Pageable.ofSize(10000)`)

result
- Expected : 15 search requests in total, a single search returning a page of 10k documents
- Actual : a single search tries to fetch 150k documents, overflowing response memory buffer and causing below Exception

`Caused by: org.springframework.dao.DataAccessResourceFailureException: entity content is too long [169386197] for the configured buffer limit [104857600]; nested exception is java.lang.RuntimeException: entity content is too long [169386197] for the configured buffer limit [104857600]`

<br>

## Possible solution
I think two parameters have totally separate roles, thus one shouldn't overwrite the other.
- `pageSize` parameter should decide how many documents that I want ES to fetch per page, 
- while `maxResult` parameter decides how many documents I want to fetch **_in total_** during the whole stream operation. 

👉 Therefore, the actual pageSize should be `Min(maxResult, pageSize)` if `maxResult` is set

Changes made : https://github.com/hy2850/spring-data-elasticsearch/commits/%233089-size-and-maxResults/

<br>

## Further explanation with code for your understanding
`SearchOperations.searchForStream` returns a CloseableIterator that uses scroll API to search through a large set of data by pages.
Currently, `maxResult` is solely used to get `maxCount` for this method.

https://github.com/spring-projects/spring-data-elasticsearch/blob/0e5af905818ca852c377c0aa576e56516b713dac/src/main/java/org/springframework/data/elasticsearch/core/AbstractElasticsearchTemplate.java#L410-L423

The iterator keeps track of how many documents it has fetched so far, in `currentCount` instance variable.
https://github.com/spring-projects/spring-data-elasticsearch/blob/0e5af905818ca852c377c0aa576e56516b713dac/src/main/java/org/springframework/data/elasticsearch/core/StreamQueries.java#L130-L137

When `currentCount` reaches`maxCount`, the iterator is forced to return false for `hasNext`, even when there could be more documents available for searching. 
https://github.com/spring-projects/spring-data-elasticsearch/blob/0e5af905818ca852c377c0aa576e56516b713dac/src/main/java/org/springframework/data/elasticsearch/core/StreamQueries.java#L107-L116

So, I reached a conclusion that `maxResult` should only decide `maxCount`, not the `pageSize`, and this is a bug.

	if (query.getPageable().isPaged()) {
	builder //
	.from((int) query.getPageable().getOffset()) //
	.size(query.getPageable().getPageSize());
	} else {
	builder.from(0).size(INDEX_MAX_RESULT_WINDOW);
	}

	if (!isEmpty(query.getFields())) {
	var fieldAndFormats = query.getFields().stream().map(field -> FieldAndFormat.of(b -> b.field(field))).toList();
	builder.fields(fieldAndFormats);
	}

	if (!isEmpty(query.getStoredFields())) {
	builder.storedFields(query.getStoredFields());
	}

	if (query.getIndicesOptions() != null) {
	addIndicesOptions(builder, query.getIndicesOptions());
	}

	if (query.isLimiting()) {
	builder.size(query.getMaxResults());
	}

	@Override
	public <T> SearchHitsIterator<T> searchForStream(Query query, Class<T> clazz, IndexCoordinates index) {

	Duration scrollTime = query.getScrollTime() != null ? query.getScrollTime() : Duration.ofMinutes(1);
	long scrollTimeInMillis = scrollTime.toMillis();
	// noinspection ConstantConditions
	int maxCount = query.isLimiting() ? query.getMaxResults() : 0;

	return StreamQueries.streamResults( //
	maxCount, //
	searchScrollStart(scrollTimeInMillis, query, clazz, index), //
	scrollId -> searchScrollContinue(scrollId, scrollTimeInMillis, clazz, index), //
	this::searchScrollClear);
	}

	@Override
	public SearchHit<T> next() {
	if (hasNext()) {
	currentCount.incrementAndGet();
	return currentScrollHits.next();
	}
	throw new NoSuchElementException();
	}

	@Override
	public boolean hasNext() {

	boolean hasNext = false;

	if (!isClosed && continueScroll && (maxCount <= 0 \|\| currentCount.get() < maxCount)) {

	if (!currentScrollHits.hasNext()) {
	SearchScrollHits<T> nextPage = continueScrollFunction.apply(scrollState.getScrollId());
	currentScrollHits = nextPage.iterator();

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

In SearchOperations.searchForStream, 'maxResults' overwrites 'size' (pageSize) #3089

Bug

Possible solution

Further explanation with code for your understanding

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

In SearchOperations.searchForStream, 'maxResults' overwrites 'size' (pageSize) #3089

Description

Bug

Possible solution

Further explanation with code for your understanding

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions