Description
Motivation
Today with the Elasticsearch client you have to write a JSON blob to the body
to pass any values within the HTTP body.
Unfortunately this is where some of the most complex data structures are for the Elasticsearch API (see query DSL, aggregations)
which means we're unable to provide a good window into this opaque object via types or auto-complete.
This is what writing a search query looks like today:
from elasticsearch import Elasticsearch
client = Elasticsearch("https://localhost:9200")
client.search(
index="test-index",
size=10,
body={
"runtime_mappings": {
"day_of_week": {
"type": "keyword",
"script": "emit(doc['@timestamp'].value.dayOfWeekEnum)"
}
},
"aggs": {
"day_of_week_count": {
"value_count": {
"field": "day_of_week"
}
}
}
}
)
And here is what the type hints for the search
API look like for today:
def search(
index: Optional[Any] = ...,
body: Optional[Any] = ...,
size: Optional[Any] = ...,
...,
): ...
Something to note for later: the size
parameter is always serialized in the query string in this example.
Proposed Solution
With the expanded JSON body fields implemented the same search
API call can be written like so:
client.search(
index="test-index",
size=10,
runtime_mappings={
"day_of_week": {
"type": "keyword",
"script": "emit(doc['@timestamp'].value.dayOfWeekEnum)"
}
},
aggs={
"day_of_week_count": {
"value_count": {
"field": "day_of_week"
}
}
}
)
Notice how the body fields and fields that are serialized elsewhere like index
are all at the same level and you're writing actual Python code instead of wrangling with JSON at the top-level of the API.
And here is what the function signature and types for the search
API would look like:
def search(
index: Optional[Union[str, List[str]]] = ...,
size: Optional[int] = ...,
runtime_mappings: Optional[Mapping[str, Any]] = ...,
aggs: Optional[Mapping[str, Any]] = ...
...,
body: Optional[Any] = ...,
): ...
The differences that I want to highlight:
- There is still a
body
parameter for backwards compatibility with queries written before this change. This won't be a breaking change, any queries that are using thebody
parameter will still function exactly as they do today. Thebody
parameter will be deprecated on APIs that support expanded body fields. - Using the Elasticsearch specification we get better type hints for parameters that already exist. You can see this example best with the
index
andsize
parameters, previously they wereOptional[Any]
and now they areOptional[Union[str, List[str]]
andOptional[int]
respectively. This gives you a lot more confidence when writing API calls that you're using the right types the first time. - Not shown, but the
size
parameter would be encoded in the HTTP request body instead of the query string. This is an improvement over the current serialization strategy because space within the HTTP request target (URL path + query) is limited. In the past we've seen errors from users using Scroll IDs which can be quite verbose and currently have a work-around to serialize Scroll IDs in the HTTP body. Another motivator is that any value serialized in the query string instead of the request body isn't effected by HTTP body compression.
Nuances and Future Improvements
These improvements are a first-step towards a fully-typed Elasticsearch Python client and will let users start down the path of adding richer types their Elasticsearch code. However there are a few things to note:
When will DeprecationWarnings
start?
I'm unsure if they should be emitted in the next release (7.15) or if I should wait until more APIs are supported. Tough balance between alerting users to upcoming changes and new features and being difficult to avoid DeprecationWarnings
in general usage of the client.
Not all APIs can take advantage right away
For an API to have expanded body fields it must be completed defined within the Elasticsearch specification. There are a lot of APIs that aren't completely defined yet so rollout of this change may take some time while the specification is filled.
In these cases APIs will be generated using the previous process and have a simple body
parameter that won't raise a DeprecationWarning
for the time being.
When the body
parameter is used, return to old behavior
This means that all previous code written will use the old behavior until updated to not use the body
parameter. A DeprecationWarning
will be raised in cases where the body
parameter could be replaced by expanded fields.
If both a body field and the body
parameter are defined a ValueError
will be raised as this configuration isn't supported.
Some fields that were once serialized to the query are now serialized to the body
This is true! However I believe that API compatibility won't be broken by this behavior change as Elasticsearch will treat the two different serializations the same. Some examples of this are _source
, size
, from
, etc.
Deeply nested objects are still untyped
For example the runtime_mappings
object is typed as Mapping[str, Any]
where instead it should be mapped as Mapping[str, TypedDict[type: Union[Mapping[str, Any], str], format: str, script: Optional[str]]]
.
The aim is to reduce the amount of Any
types in type hints if possible but for this initial implementation using only built-in Python types as this is as far as we can define in the general case. This means only using scalars (int
, float
, str
, etc), and Union
, Optional
, List
, Mapping
, and Any
.
typing.TypedDict
is a newly added feature to Python 3.8 but in terms of ability to describe the complex structures of Elasticsearch API is missing a few critical features. Specifically the ability to mark one or more keys as "optional" to include without using the total=False
parameter which makes all keys optional. I'll continue to watch the Python typing space for additional improvements there.
In the future defining our own objects and types may be required to represent these complex types.
What about conflicts with per-request parameters and body fields?
Parameters like api_key
are a parameter on every API in order to define different authentication config per request.
We want to continue to support code written this way so for now APIs that have a conflict between per-request parameters
and body fields will continue to use the old behavior of a single body
parameter.
APIs that have these conflicts will not have expanded body fields for now, there is a future improvement in the works to solve this issue.
What about bodies that aren't JSON?
These APIs won't be changed and will continue to have a body
parameter for 7.x.
The future is keyword-only
In Python 3 keyword-only arguments were added which allowed making functions automatically raise a TypeError
if called without using that argument as a keyword argument. This is a fantastic feature because if makes all code written with a library much more readable and maintainable. It also makes my job as a library maintainer much easier as I no longer have to worry about breaking code wrt. the order of parameters, only that they are there.
Currently type stubs define all parameters as keyword-only (except required path parameters) but because these are stubs and not function signatures there's no enforcement of this unless you opt-in with mypy or another type-checking tool.
Starting in 8.x keyword-only arguments will be used for all parameters so users should switch over to keyword arguments (as has always been recommended) as soon as possible!