Open
Description
Problem
When we need to put a lot of documents in index, we need to use queryset_pagination
meta option to paginate. Django pagination need a sorted queryset with order_by
(cf doc) otherwise same pk can be present more than once and others missing (like #71).
Put order_by
on queryset will make django paginator call order_by for each page. Call order_by
on huge queryset (like 10 millions) will lead to a huge perfomance issue.
Temporary solution:
We can override _get_actions
method (from django_elasticsearch_dsl.documents.DocType
) to not use django paginator when a queryset is passed. More over because of the way a database index work, we should first fetch only pks, and then do sub request based on it.
from django.db.models.query import QuerySet
def _get_actions(self, object_list, action):
if self._doc_type.queryset_pagination and isinstance(object_list, QuerySet):
pks = object_list.order_by('pk').values_list('pk', flat=True)
len_pks = len(pks)
for start_pk_index in range(0, len_pks, self._doc_type.queryset_pagination + 1):
end_pk_index = start_pk_index + self._doc_type.queryset_pagination
if end_pk_index >= len_pks:
end_pk_index = len_pks - 1
ranged_qs = object_list.filter(pk__range=[
pks[start_pk_index],
pks[end_pk_index]
])
for object_instance in ranged_qs:
yield self._prepare_action(object_instance, action)
else:
yield from super()._get_actions(object_list, action)
Available to make the PR if needed.