Skip to content

Python client does not support SOLR deep paging cursors #356

Open
@stevegaron

Description

@stevegaron

I've noticed poor performance with some SOLR queries related to Deep Paging.(http://solr.pl/en/2014/03/10/solr-4-7-efficient-deep-paging/)

Here is the use case:
I need to pull all keys from a bucket that match a given filter.

Right now I do something like this:

def list_keys(bucket, my_filter)
    out=[]
    start=0
    rows=1000
    done = False

    while not done:
        results = bucket.search(my_filter, fl="_yz_rk", start=start, rows=rows)
        out.extend([x["_yz_rk"] for x in results['docs']])
        start += rows
        if len(results['docs']) < rows:
            done = True

    return out

The problem with this is the deeper I go in the index, the slower the bucket.search gets. This is especially true when you add a sort to the search...

SOLR Fixed the issue in 4.7 by passing a cursor instead of using the 'start' parameter. Therefor I expect to do something like this:

def list_keys(bucket, my_filter)
    out=[]
    cursorMark="*"
    rows=1000
    done = False

    while not done:
        results = bucket.search(my_filter, fl="_yz_rk", rows=rows, cursorMark=cursorMark)
        cursorMark = results['nextCursorMark']
        out.extend([x["_yz_rk"] for x in results['docs']])
        if len(results['docs']) < rows:
            done = True

    return out

As it turns out, right now, the python client does not pass back the 'nextCursorMark' and only the docs, max_score and num_found are returned to the results object.

Thank you,
Steve

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions