Open
Description
I've noticed poor performance with some SOLR queries related to Deep Paging.(http://solr.pl/en/2014/03/10/solr-4-7-efficient-deep-paging/)
Here is the use case:
I need to pull all keys from a bucket that match a given filter.
Right now I do something like this:
def list_keys(bucket, my_filter)
out=[]
start=0
rows=1000
done = False
while not done:
results = bucket.search(my_filter, fl="_yz_rk", start=start, rows=rows)
out.extend([x["_yz_rk"] for x in results['docs']])
start += rows
if len(results['docs']) < rows:
done = True
return out
The problem with this is the deeper I go in the index, the slower the bucket.search gets. This is especially true when you add a sort to the search...
SOLR Fixed the issue in 4.7 by passing a cursor instead of using the 'start' parameter. Therefor I expect to do something like this:
def list_keys(bucket, my_filter)
out=[]
cursorMark="*"
rows=1000
done = False
while not done:
results = bucket.search(my_filter, fl="_yz_rk", rows=rows, cursorMark=cursorMark)
cursorMark = results['nextCursorMark']
out.extend([x["_yz_rk"] for x in results['docs']])
if len(results['docs']) < rows:
done = True
return out
As it turns out, right now, the python client does not pass back the 'nextCursorMark' and only the docs, max_score and num_found are returned to the results object.
Thank you,
Steve