Description
There are currently three ways to instrument an application that uses the Elasticsearch Python client:
- Using the Elastic APM Python agent
- Using the OpenTelemetry contrib elasticsearch-py instrumentation
- For completeness, I suppose using the Elastic Intake API manually is an option.
The main benefit of the existing approaches is that while they can handle past versions of this client, they're fragile to changes in elasticsearch-py, are difficult to test and may be suboptimal as the code may need refactorings to emit spans with complete information. In the case of the OpenTelemetry instrumentations, they don't have clear owners. For this reason, it makes sense to implement this in elasticsearch-py where the feature will be tested and maintained.
We will follow the existing Semantic conventions for Elasticsearch and emit spans for all requests when the opentelemetry-api optional package will be detected. Correctly configuring OpenTelemetry SDK will be left to the user.
The elasticsearch-ruby OpenTelemetry instrumentation will be used as inspiration:
- [OTel instrumentation] Add path params and endpoint in opts to perform_request elasticsearch-ruby#2179 makes sure to send enough metadata to the transport
- OTel instrumentation elastic-transport-ruby#54 performs the actual instrumentation
Configuration
OTEL_PYTHON_INSTRUMENTATION_ELASTICSEARCH_ENABLED
(default: true) Enable / Disable the OpenTelemetry instrumentation. With this configuration option you can enable (default) or disable the built-in OpenTelemetry instrumentation.OTEL_PYTHON_INSTRUMENTATION_ELASTICSEARCH_CAPTURE_SEARCH_QUERY
(default: omit) Capture search request bodies. Per default, the built-in OpenTelemetry instrumentation does not capture request bodies due to data privacy considerations. You can use this option to enable capturing of search queries from the request bodies of Elasticsearch search requests in case you wish to gather this information regardless. The options are to capture the raw search query, sanitize the query with a default list of sensitive keys, or not capture it at all. Valid Options:omit
,sanitize
,raw
OTEL_PYTHON_INSTRUMENTATION_ELASTICSEARCH_SEARCH_QUERY_SANITIZE_KEYS
(default: None) Sanitize the Elasticsearch search request body You can configure the list of keys whose values are redacted when the search query is captured. Values must be comma-separated.
Testing
We will have an "OpenTelemetry" test mode that will run the whole test suite with OpenTelemetry enabled and will test a simple API to make sure the exported spans are correct.
The InMemorySpanExporter allows end-to-end testing.
Endpoint id and path parts
Those will need to be passed to the transport.
Steps
- Add minimal OpenTelemetry instrumentation elastic-transport-python#150
- Accept endpoint_id and path_params from client elastic-transport-python#151 and Pass endpoint_id and path_parts to transport #2457
- Add OpenTelemetry end-to-end test #2466
- Support db.statement, server and url attributes elastic-transport-python#155
- Type endpoint_id and path_parts as Optional elastic-transport-python#156 and Type endpoint_id and path_parts as Optional #2482
- Run OpenTelemetry integration test separately #2479
- Document and enable OpenTelemetry instrumentation #2491
- Sanitization
Supported attributes
Required
- db.system
- db.elasticsearch.path_parts
- db.operation
- http.request.method
- url.full
Recommended
- db.elasticsearch.cluster.name
- db.elasticsearch.node.name
- db.statement
- server.address
- server.port