From 98d0f373eaca1ea3634a79c814f4013478608312 Mon Sep 17 00:00:00 2001
From: Elastic Machine Delete documents.
- Deletes documents that match the specified query. Delete documents. Deletes documents that match the specified query. If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or alias: You can specify the query criteria in the request URI or the request body using the same syntax as the search API.
+ When you submit a delete by query request, Elasticsearch gets a snapshot of the data stream or index when it begins processing the request and deletes matching documents using internal versioning.
+ If a document changes between the time that the snapshot is taken and the delete operation is processed, it results in a version conflict and the delete operation fails. NOTE: Documents with a version equal to 0 cannot be deleted using delete by query because internal versioning does not support 0 as a valid version number. While processing a delete by query request, Elasticsearch performs multiple search requests sequentially to find all of the matching documents to delete.
+ A bulk delete request is performed for each batch of matching documents.
+ If a search or bulk request is rejected, the requests are retried up to 10 times, with exponential back off.
+ If the maximum retry limit is reached, processing halts and all failed requests are returned in the response.
+ Any delete requests that completed successfully still stick, they are not rolled back. You can opt to count version conflicts instead of halting and returning by setting Throttling delete requests To control the rate at which delete by query issues batches of delete operations, you can set Throttling uses a wait time between batches so that the internal scroll requests can be given a timeout that takes the request padding into account.
+ The padding time is the difference between the batch size divided by the Since the batch is issued as a single Slicing Delete by query supports sliced scroll to parallelize the delete process.
+ This can improve efficiency and provide a convenient way to break the request down into smaller parts. Setting If you're slicing manually or otherwise tuning automatic slicing, keep in mind that: Whether query or delete performance dominates the runtime depends on the documents being reindexed and cluster resources. Cancel a delete by query operation Any delete by query can be canceled using the task cancel API. For example: The task ID can be found by using the get tasks API. Cancellation should happen quickly but might take a few seconds.
+ The get task status API will continue to list the delete by query task until this task checks that it has been cancelled and terminates itself.
+
+ read
delete
or write
conflicts
to proceed
.
+ Note that if you opt to count version conflicts the operation could attempt to delete more documents from the source than max_docs
until it has successfully deleted max_docs documents
, or it has gone through every document in the source query.requests_per_second
to any positive decimal number.
+ This pads each batch with a wait time to throttle the rate.
+ Set requests_per_second
to -1
to disable throttling.requests_per_second
and the time spent writing.
+ By default the batch size is 1000
, so if requests_per_second
is set to 500
:
+ target_time = 1000 / 500 per second = 2 seconds
+ wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
+
_bulk
request, large batch sizes cause Elasticsearch to create many requests and wait before starting the next set.
+ This is "bursty" instead of "smooth".slices
to auto
lets Elasticsearch choose the number of slices to use.
+ This setting will use one slice per shard, up to a certain limit.
+ If there are multiple source data streams or indices, it will choose the number of slices based on the index or backing index with the smallest number of shards.
+ Adding slices to the delete by query operation creates sub-requests which means it has some quirks:
+
+ slices
will rethrottle the unfinished sub-request proportionally.slices
will cancel each sub-request.slices
each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.requests_per_second
and max_docs
on a request with slices
are distributed proportionally to each sub-request. Combine that with the earlier point about distribution being uneven and you should conclude that using max_docs
with slices
might not result in exactly max_docs
documents being deleted.
+
+ slices
hurts performance. Setting slices
higher than the number of shards generally does not improve efficiency and adds overhead.
+ POST _tasks/r1A2WoRbTwKZ516z6NEs5A:36619/_cancel
+
search_after
requests, then the results of those requests might not be consistent as changes happening
between searches are only visible to the more recent point in time.
- A point in time must be opened explicitly before being used in search requests.
- The keep_alive
parameter tells Elasticsearch how long it should persist.
A point in time must be opened explicitly before being used in search requests.
+A subsequent search request with the pit
parameter must not specify index
, routing
, or preference
values as these parameters are copied from the point in time.
Just like regular searches, you can use from
and size
to page through point in time search results, up to the first 10,000 hits.
+ If you want to retrieve more hits, use PIT with search_after
.
IMPORTANT: The open point in time request and each subsequent search request can return different identifiers; always use the most recently received ID for the next search request.
+When a PIT that contains shard failures is used in a search request, the missing are always reported in the search response as a NoShardAvailableActionException
exception.
+ To get rid of these exceptions, a new PIT needs to be created so that shards missing from the previous PIT can be handled, assuming they become available in the meantime.
Keeping point in time alive
+The keep_alive
parameter, which is passed to a open point in time request and search request, extends the time to live of the corresponding point in time.
+ The value does not need to be long enough to process all data — it just needs to be long enough for the next request.
Normally, the background merge process optimizes the index by merging together smaller segments to create new, bigger segments. + Once the smaller segments are no longer needed they are deleted. + However, open point-in-times prevent the old segments from being deleted since they are still in use.
+TIP: Keeping older segments alive means that more disk space and file handles are needed. + Ensure that you have configured your nodes to have ample free file handles.
+Additionally, if a segment contains deleted or updated documents then the point in time must keep track of whether each document in the segment was live at the time of the initial search request. + Ensure that your nodes have sufficient heap space if you have many open point-in-times on an index that is subject to ongoing deletes or updates. + Note that a point-in-time doesn't prevent its associated indices from being deleted. + You can check how many point-in-times (that is, search contexts) are open with the nodes stats API.
`Get search hits that match the query defined in the request.
You can provide search queries using the q
query string parameter or the request body.
If both are specified, only the query parameter is used.
If the Elasticsearch security features are enabled, you must have the read index privilege for the target data stream, index, or alias. For cross-cluster search, refer to the documentation about configuring CCS privileges.
+ To search a point in time (PIT) for an alias, you must have the read
index privilege for the alias's data streams or indices.
Search slicing
+When paging through a large number of documents, it can be helpful to split the search into multiple slices to consume them independently with the slice
and pit
properties.
+ By default the splitting is done first on the shards, then locally on each shard.
+ The local splitting partitions the shard into contiguous ranges based on Lucene document IDs.
For instance if the number of shards is equal to 2 and you request 4 slices, the slices 0 and 2 are assigned to the first shard and the slices 1 and 3 are assigned to the second shard.
+IMPORTANT: The same point-in-time ID should be used for all slices. + If different PIT IDs are used, slices can overlap and miss documents. + This situation can occur because the splitting criterion is based on Lucene document IDs, which are not stable across changes to the index.
`Cancel a migration reindex operation.
+Cancel a migration reindex attempt for a data stream or index.
+ + + `Create an index from a source index.
+Copy the mappings and settings from the source index to a destination index while allowing request settings and mappings to override the source values.
+ + + `Get the migration reindexing status.
+Get the status of a migration reindex attempt for a data stream or index.
+ + + `Reindex legacy backing indices.
+Reindex all legacy backing indices for a data stream. + This operation occurs in a persistent task. + The persistent task ID is returned immediately and the reindexing work is completed in that task.
+ + + `Delete documents. - Deletes documents that match the specified query.
+Delete documents.
+Deletes documents that match the specified query.
+If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or alias:
+read
delete
or write
You can specify the query criteria in the request URI or the request body using the same syntax as the search API. + When you submit a delete by query request, Elasticsearch gets a snapshot of the data stream or index when it begins processing the request and deletes matching documents using internal versioning. + If a document changes between the time that the snapshot is taken and the delete operation is processed, it results in a version conflict and the delete operation fails.
+NOTE: Documents with a version equal to 0 cannot be deleted using delete by query because internal versioning does not support 0 as a valid version number.
+While processing a delete by query request, Elasticsearch performs multiple search requests sequentially to find all of the matching documents to delete. + A bulk delete request is performed for each batch of matching documents. + If a search or bulk request is rejected, the requests are retried up to 10 times, with exponential back off. + If the maximum retry limit is reached, processing halts and all failed requests are returned in the response. + Any delete requests that completed successfully still stick, they are not rolled back.
+You can opt to count version conflicts instead of halting and returning by setting conflicts
to proceed
.
+ Note that if you opt to count version conflicts the operation could attempt to delete more documents from the source than max_docs
until it has successfully deleted max_docs documents
, or it has gone through every document in the source query.
Throttling delete requests
+To control the rate at which delete by query issues batches of delete operations, you can set requests_per_second
to any positive decimal number.
+ This pads each batch with a wait time to throttle the rate.
+ Set requests_per_second
to -1
to disable throttling.
Throttling uses a wait time between batches so that the internal scroll requests can be given a timeout that takes the request padding into account.
+ The padding time is the difference between the batch size divided by the requests_per_second
and the time spent writing.
+ By default the batch size is 1000
, so if requests_per_second
is set to 500
:
target_time = 1000 / 500 per second = 2 seconds
+ wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
+
+ Since the batch is issued as a single _bulk
request, large batch sizes cause Elasticsearch to create many requests and wait before starting the next set.
+ This is "bursty" instead of "smooth".
Slicing
+Delete by query supports sliced scroll to parallelize the delete process. + This can improve efficiency and provide a convenient way to break the request down into smaller parts.
+Setting slices
to auto
lets Elasticsearch choose the number of slices to use.
+ This setting will use one slice per shard, up to a certain limit.
+ If there are multiple source data streams or indices, it will choose the number of slices based on the index or backing index with the smallest number of shards.
+ Adding slices to the delete by query operation creates sub-requests which means it has some quirks:
slices
will rethrottle the unfinished sub-request proportionally.slices
will cancel each sub-request.slices
each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.requests_per_second
and max_docs
on a request with slices
are distributed proportionally to each sub-request. Combine that with the earlier point about distribution being uneven and you should conclude that using max_docs
with slices
might not result in exactly max_docs
documents being deleted.If you're slicing manually or otherwise tuning automatic slicing, keep in mind that:
+slices
hurts performance. Setting slices
higher than the number of shards generally does not improve efficiency and adds overhead.Whether query or delete performance dominates the runtime depends on the documents being reindexed and cluster resources.
+Cancel a delete by query operation
+Any delete by query can be canceled using the task cancel API. For example:
+POST _tasks/r1A2WoRbTwKZ516z6NEs5A:36619/_cancel
+
+ The task ID can be found by using the get tasks API.
+Cancellation should happen quickly but might take a few seconds. + The get task status API will continue to list the delete by query task until this task checks that it has been cancelled and terminates itself.
`search_after
requests, then the results of those requests might not be consistent as changes happening
between searches are only visible to the more recent point in time.
- A point in time must be opened explicitly before being used in search requests.
- The keep_alive
parameter tells Elasticsearch how long it should persist.
A point in time must be opened explicitly before being used in search requests.
+A subsequent search request with the pit
parameter must not specify index
, routing
, or preference
values as these parameters are copied from the point in time.
Just like regular searches, you can use from
and size
to page through point in time search results, up to the first 10,000 hits.
+ If you want to retrieve more hits, use PIT with search_after
.
IMPORTANT: The open point in time request and each subsequent search request can return different identifiers; always use the most recently received ID for the next search request.
+When a PIT that contains shard failures is used in a search request, the missing are always reported in the search response as a NoShardAvailableActionException
exception.
+ To get rid of these exceptions, a new PIT needs to be created so that shards missing from the previous PIT can be handled, assuming they become available in the meantime.
Keeping point in time alive
+The keep_alive
parameter, which is passed to a open point in time request and search request, extends the time to live of the corresponding point in time.
+ The value does not need to be long enough to process all data — it just needs to be long enough for the next request.
Normally, the background merge process optimizes the index by merging together smaller segments to create new, bigger segments. + Once the smaller segments are no longer needed they are deleted. + However, open point-in-times prevent the old segments from being deleted since they are still in use.
+TIP: Keeping older segments alive means that more disk space and file handles are needed. + Ensure that you have configured your nodes to have ample free file handles.
+Additionally, if a segment contains deleted or updated documents then the point in time must keep track of whether each document in the segment was live at the time of the initial search request. + Ensure that your nodes have sufficient heap space if you have many open point-in-times on an index that is subject to ongoing deletes or updates. + Note that a point-in-time doesn't prevent its associated indices from being deleted. + You can check how many point-in-times (that is, search contexts) are open with the nodes stats API.
`Get search hits that match the query defined in the request.
You can provide search queries using the q
query string parameter or the request body.
If both are specified, only the query parameter is used.
If the Elasticsearch security features are enabled, you must have the read index privilege for the target data stream, index, or alias. For cross-cluster search, refer to the documentation about configuring CCS privileges.
+ To search a point in time (PIT) for an alias, you must have the read
index privilege for the alias's data streams or indices.
Search slicing
+When paging through a large number of documents, it can be helpful to split the search into multiple slices to consume them independently with the slice
and pit
properties.
+ By default the splitting is done first on the shards, then locally on each shard.
+ The local splitting partitions the shard into contiguous ranges based on Lucene document IDs.
For instance if the number of shards is equal to 2 and you request 4 slices, the slices 0 and 2 are assigned to the first shard and the slices 1 and 3 are assigned to the second shard.
+IMPORTANT: The same point-in-time ID should be used for all slices. + If different PIT IDs are used, slices can overlap and miss documents. + This situation can occur because the splitting criterion is based on Lucene document IDs, which are not stable across changes to the index.
`Cancel a migration reindex operation.
+Cancel a migration reindex attempt for a data stream or index.
+ + + `Create an index from a source index.
+Copy the mappings and settings from the source index to a destination index while allowing request settings and mappings to override the source values.
+ + + `Get the migration reindexing status.
+Get the status of a migration reindex attempt for a data stream or index.
+ + + `Reindex legacy backing indices.
+Reindex all legacy backing indices for a data stream. + This operation occurs in a persistent task. + The persistent task ID is returned immediately and the reindexing work is completed in that task.
+ + + `