Skip to content

Commit 6964476

Browse files
committed
Migrate all user documentation to elastic.co
1 parent 6462574 commit 6964476

18 files changed

+638
-925
lines changed

docs/guide/configuration.asciidoc

Lines changed: 387 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,392 @@
44
This page contains information about the most important configuration options of
55
the Python {es} client.
66

7-
* <<connection-pool>>
8-
* <<connection-selector>>
97

8+
[discrete]
9+
[[tls-and-ssl]]
10+
=== TLS/SSL
1011

11-
include::connection-pool.asciidoc[]
12-
include::connection-selector.asciidoc[]
12+
The options in this section can only be used when the node is configured for HTTPS. An error will be raised if using these options with an HTTP node.
13+
14+
[discrete]
15+
==== Verifying server certificates
16+
17+
The typical route to verify a cluster certificate is via a "CA bundle" which can be specified via the `ca_certs` parameter. If no options are given and the https://github.com/certifi/python-certifi[certifi package] is installed then certifi's CA bundle is used by default.
18+
19+
If you have your own CA bundle to use you can configure via the `ca_certs` parameter:
20+
21+
[source,python]
22+
------------------------------------
23+
es = Elasticsearch(
24+
"https://...",
25+
ca_certs="/path/to/certs.pem"
26+
)
27+
------------------------------------
28+
29+
If using a generated certificate or certificate with a known fingerprint you can use the `ssl_assert_fingerprint` to specify the fingerprint which tries to match the server's leaf certificate during the TLS handshake. If there is any matching certificate the connection is verified, otherwise a `TlsError` is raised.
30+
31+
In Python 3.9 and earlier only the leaf certificate will be verified but in Python 3.10+ private APIs are used to verify any certificate in the certificate chain. This helps when using certificates that are generated on a multi-node cluster.
32+
33+
[source,python]
34+
------------------------------------
35+
es = Elasticsearch(
36+
"https://...",
37+
ssl_assert_fingerprint=(
38+
"315f5bdb76d078c43b8ac0064e4a0164612b1fce77c869345bfc94c75894edd3"
39+
)
40+
)
41+
------------------------------------
42+
43+
To disable certificate verification use the `verify_certs=False` parameter. This option should be avoided in production, instead use the other options to verify the clusters' certificate.
44+
45+
[source,python]
46+
------------------------------------
47+
es = Elasticsearch(
48+
"https://...",
49+
verify_certs=False
50+
)
51+
------------------------------------
52+
53+
[discrete]
54+
==== TLS versions
55+
56+
Configuring the minimum TLS version to connect to is done via the `ssl_version` parameter. By default this is set to a minimum value of TLSv1.2. In Python 3.7+ you can use the new `ssl.TLSVersion` enumeration to specify versions.
57+
58+
[source,python]
59+
------------------------------------
60+
import ssl
61+
62+
# Python 3.6
63+
es = Elasticsearch(
64+
...,
65+
ssl_version=ssl.PROTOCOL_TSLv1_2
66+
)
67+
68+
# Python 3.7+
69+
es = Elasticsearch(
70+
...,
71+
ssl_version=ssl.TLSVersion.TLSv1_2
72+
)
73+
------------------------------------
74+
75+
[discrete]
76+
==== Client TLS certificate authentication
77+
78+
Elasticsearch can be configured to authenticate clients via TLS client certificates. Client certificate and keys can be configured via the `client_cert` and `client_key` parameters:
79+
80+
[source,python]
81+
------------------------------------
82+
es = Elasticsearch(
83+
...,
84+
client_cert="/path/to/cert.pem",
85+
client_key="/path/to/key.pem",
86+
)
87+
------------------------------------
88+
89+
90+
[discrete]
91+
==== Using an SSLContext
92+
93+
For advanced users an `ssl.SSLContext` object can be used for configuring TLS via the `ssl_context` parameter. The `ssl_context` parameter can't be combined with any other TLS options except for the `ssl_assert_fingerprint` parameter.
94+
95+
[source,python]
96+
------------------------------------
97+
import ssl
98+
99+
# Create and configure an SSLContext
100+
ctx = ssl.create_default_context()
101+
ctx.load_verify_locations(...)
102+
103+
es = Elasticsearch(
104+
...,
105+
ssl_context=ctx
106+
)
107+
------------------------------------
108+
109+
110+
[discrete]
111+
[[compression]]
112+
=== HTTP compression
113+
114+
Compression of HTTP request and response bodies can be enabled with the `http_compress` parameter.
115+
If enabled then HTTP request bodies will be compressed with `gzip` and HTTP responses will include
116+
the `Accept-Encoding: gzip` HTTP header. By default compression is disabled.
117+
118+
[source,python]
119+
------------------------------------
120+
es = Elasticsearch(
121+
...,
122+
http_compress=True # Enable compression!
123+
)
124+
------------------------------------
125+
126+
HTTP compression is recommended to be enabled when requests are traversing the network.
127+
128+
129+
[discrete]
130+
[[timeouts]]
131+
=== Request timeouts
132+
133+
Requests can be configured to timeout if taking too long to be serviced. The `request_timeout` parameter can be passed via the client constructor or the client `.options()` method. When the request times out the node will raise a `ConnectionTimeout` exception which can trigger retries.
134+
135+
Setting `request_timeout` to `None` will disable timeouts.
136+
137+
[source,python]
138+
------------------------------------
139+
es = Elasticsearch(
140+
...,
141+
request_timeout=10 # 10 second timeout
142+
)
143+
144+
# Search request will timeout in 5 seconds
145+
es.options(request_timeout=5).search(...)
146+
------------------------------------
147+
148+
[discrete]
149+
==== API and server timeouts
150+
151+
There are API-level timeouts to take into consideration when making requests which can cause the request to timeout on server-side rather than client-side. You may need to configure both a transport and API level timeout for long running operations.
152+
153+
In the example below there are three different configurable timeouts for the `cluster.health` API all with different meanings for the request:
154+
155+
[source,python]
156+
------------------------------------
157+
es.options(
158+
# Amount of time to wait for an HTTP response to start.
159+
request_timeout=30
160+
).cluster.health(
161+
# Amount of time to wait to collect info on all nodes.
162+
timeout=30,
163+
# Amount of time to wait for info from the master node.
164+
master_timeout=10,
165+
)
166+
------------------------------------
167+
168+
169+
[discrete]
170+
[[retries]]
171+
=== Retries
172+
173+
Requests can be retried if they don't return with a successful response. This provides a way for requests to be resilient against transient failures or overloaded nodes.
174+
175+
The maximum number of retries per request can be configured via the `max_retries` parameter. Setting this parameter to 0 disables retries. This parameter can be set in the client constructor or per-request via the client `.options()` method:
176+
177+
[source,python]
178+
------------------------------------
179+
es = Elasticsearch(
180+
...,
181+
max_retries=5
182+
)
183+
184+
# For this API request we disable retries with 'max_retries=0'
185+
es.options(max_retries=0).index(
186+
index="blogs",
187+
document={
188+
"title": "..."
189+
}
190+
)
191+
------------------------------------
192+
193+
[discrete]
194+
==== Retrying on connection errors and timeouts
195+
196+
Connection errors are automatically retried if retries are enabled. Retrying requests on connection timeouts can be enabled or disabled via the `retry_on_timeout` parameter. This parameter can be set on the client constructor or via the client `.options()` method:
197+
198+
[source,python]
199+
------------------------------------
200+
es = Elasticsearch(
201+
...,
202+
retry_on_timeout=True
203+
)
204+
es.options(retry_on_timeout=False).info()
205+
------------------------------------
206+
207+
[discrete]
208+
==== Retrying status codes
209+
210+
By default if retries are enabled `retry_on_status` is set to `(429, 502, 503, 504)`. This parameter can be set on the client constructor or via the client `.options()` method. Setting this value to `False` or `()` will disable the default behavior.
211+
212+
[source,python]
213+
------------------------------------
214+
es = Elasticsearch(
215+
...,
216+
retry_on_status=False
217+
)
218+
219+
# Retry this API on '500 Internal Error' statuses
220+
es.options(retry_on_status=[500]).index(
221+
index="blogs",
222+
document={
223+
"title": "..."
224+
}
225+
)
226+
------------------------------------
227+
228+
[discrete]
229+
==== Ignoring status codes
230+
231+
By default an `ApiError` exception will be raised for any non-2XX HTTP requests that exhaust retries, if any. If you're expecting an HTTP error from the API but aren't interested in raising an exception you can use the `ignore_status` parameter via the client `.options()` method.
232+
233+
A good example where this is useful is setting up or cleaning up resources in a cluster in a robust way:
234+
235+
[source,python]
236+
------------------------------------
237+
es = Elasticsearch(...)
238+
239+
# API request is robust against the index not existing:
240+
resp = es.options(ignore_status=404).indices.delete(index="delete-this")
241+
resp.meta.status # Can be either '2XX' or '404'
242+
243+
# API request is robust against the index already existing:
244+
resp = es.options(ignore_status=[400]).indices.create(
245+
index="create-this",
246+
mapping={
247+
"properties": {"field": {"type": "integer"}}
248+
}
249+
)
250+
resp.meta.status # Can be either '2XX' or '400'
251+
------------------------------------
252+
253+
When using the `ignore_status` parameter the error response will be returned serialized just like a non-error response. In these cases it can be useful to inspect the HTTP status of the response. To do this you can inspect the `resp.meta.status`.
254+
255+
[discrete]
256+
[[sniffing]]
257+
=== Sniffing for new nodes
258+
259+
Additional nodes can be discovered by a process called "sniffing" where the client will query the cluster for more nodes that can handle requests.
260+
261+
Sniffing can happen at three different times: on client instantiation, before requests, and on a node failure. These three behaviors can be enabled and disabled with the `sniff_on_start`, `sniff_before_requests`, and `sniff_on_node_failure` parameters.
262+
263+
IMPORTANT: When using an HTTP load balancer or proxy you cannot use sniffing functionality as the cluster would supply the client with IP addresses to directly connect to the cluster, circumventing the load balancer. Depending on your configuration this might be something you don't want or break completely.
264+
265+
[discrete]
266+
==== Waiting between sniffing attempts
267+
268+
To avoid needlessly sniffing too often there is a delay between attempts to discover new nodes. This value can be controlled via the `min_wait_between_sniffing` parameter.
269+
270+
[discrete]
271+
==== Filtering nodes which are sniffed
272+
273+
By default nodes which are marked with only a `master` role will not be used. To change the behavior the parameter `sniff_filter`
274+
275+
276+
[discrete]
277+
[[node-pool]]
278+
=== Node Pool
279+
280+
[discrete]
281+
==== Selecting a node from the pool
282+
283+
You can specify a node selector pattern via the `node_selector_class` parameter. The supported values are `round_robin` and `random`. Default is `round_robin`.
284+
285+
[source,python]
286+
------------------------------------
287+
es = Elasticsearch(
288+
...,
289+
node_selector_class="round_robin"
290+
)
291+
------------------------------------
292+
293+
Custom selectors are also supported:
294+
295+
[source,python]
296+
------------------------------------
297+
from elastic_transport import NodeSelector
298+
299+
class CustomSelector(NodeSelector):
300+
def select(nodes): ...
301+
302+
es = Elasticsearch(
303+
...,
304+
node_selector_class=CustomSelector
305+
)
306+
------------------------------------
307+
308+
[discrete]
309+
==== Marking nodes dead and alive
310+
311+
Individual nodes of Elasticsearch may have transient connectivity or load issues which may make them unable to service requests. To combat this the pool of nodes will detect when a node isn't able to service requests due to transport or API errors.
312+
313+
After a node has been timed out it will be moved back to the set of "alive" nodes but only after the node returns a successful response will the node be marked as "alive" in terms of consecutive errors.
314+
315+
The `dead_node_backoff_factor` and `max_dead_node_backoff` parameters can be used to configure how long the node pool will put the node into timeout with each consecutive failure. Both parameters use a unit of seconds.
316+
317+
The calculation is equal to `min(dead_node_backoff_factor * (2 ** (consecutive_failures - 1)), max_dead_node_backoff)`.
318+
319+
320+
[discrete]
321+
[[serializer]]
322+
=== Serializers
323+
324+
Serializers transform bytes on the wire into native Python objects and vice-versa. By default the client ships with serializers for `application/json`, `application/x-ndjson`, `text/*`, and `application/mapbox-vector-tile`.
325+
326+
You can define custom serializers via the `serializers` parameter:
327+
328+
[source,python]
329+
------------------------------------
330+
from elasticsearch import Elasticsearch, JsonSerializer
331+
332+
class JsonSetSerializer(Jsonserializer):
333+
"""Custom JSON serializer that handles Python sets"""
334+
def default(value: Any) -> Any:
335+
if isinstance(value, set):
336+
return list(value)
337+
return super().default(value)
338+
339+
es = Elasticsearch(
340+
...,
341+
# Serializers are a mapping of 'mimetype' to Serializer class.
342+
serializers={"application/json": JsonSetSerializer}
343+
)
344+
------------------------------------
345+
346+
347+
[discrete]
348+
[[nodes]]
349+
=== Nodes
350+
351+
[discrete]
352+
==== Node implementations
353+
354+
The default node class for synchronous I/O is `urllib3` and the default node class for asynchronous I/O is `aiohttp`.
355+
356+
For all of the built-in HTTP node implementations like `urllib3`, `requests`, and `aiohttp` you can specify with a simple string to the `node_class` parameter:
357+
358+
[source,python]
359+
------------------------------------
360+
from elasticsearch import Elasticsearch
361+
362+
es = Elasticsearch(
363+
...,
364+
node_class="requests"
365+
)
366+
------------------------------------
367+
368+
You can also specify a custom node implementation via the `node_class` parameter:
369+
370+
[source,python]
371+
------------------------------------
372+
from elasticsearch import Elasticsearch
373+
from elastic_transport import Urllib3HttpNode
374+
375+
class CustomHttpNode(Urllib3HttpNode):
376+
...
377+
378+
es = Elasticsearch(
379+
...
380+
node_class=CustomHttpNode
381+
)
382+
------------------------------------
383+
384+
[discrete]
385+
==== HTTP connections per node
386+
387+
Each node contains its own pool of HTTP connections to allow for concurrent requests. This value is configurable via the `connections_per_node` parameter:
388+
389+
[source,python]
390+
------------------------------------
391+
es = Elasticsearch(
392+
...,
393+
connections_per_node=5
394+
)
395+
------------------------------------

0 commit comments

Comments
 (0)