Expose JSON body fields as typed parameters

## Motivation

Today with the Elasticsearch client you have to write a JSON blob to the `body` to pass any values within the HTTP body.
Unfortunately this is where some of the most complex data structures are for the Elasticsearch API (see query DSL, aggregations)
which means we're unable to provide a good window into this opaque object via types or auto-complete.

This is what writing a search query looks like today:

```python
from elasticsearch import Elasticsearch
client = Elasticsearch("https://localhost:9200")

client.search(
    index="test-index",
    size=10,
    body={
        "runtime_mappings": {
            "day_of_week": {
                "type": "keyword",
                "script": "emit(doc['@timestamp'].value.dayOfWeekEnum)"
            }
        },
        "aggs": {
            "day_of_week_count": {
                "value_count": {
                    "field": "day_of_week"
                }
            }
        }
    }
)
```

And here is what the type hints for the `search` API look like for today:

```python
def search(
    index: Optional[Any] = ...,
    body: Optional[Any] = ...,
    size: Optional[Any] = ...,
    ...,
): ...
```

Something to note for later: the `size` parameter is always serialized in the query string in this example.

## Proposed Solution

With the expanded JSON body fields implemented the same `search` API call can be written like so:

```python
client.search(
    index="test-index",
    size=10,
    runtime_mappings={
        "day_of_week": {
            "type": "keyword",
            "script": "emit(doc['@timestamp'].value.dayOfWeekEnum)"
        }
    },
    aggs={
        "day_of_week_count": {
            "value_count": {
                "field": "day_of_week"
            }
        }
    }
)
```

Notice how the body fields and fields that are serialized elsewhere like `index` are all at the same level and you're writing actual Python code instead of wrangling with JSON at the top-level of the API.

And here is what the function signature and types for the `search` API would look like:

```python
def search(
    index: Optional[Union[str, List[str]]] = ...,
    size: Optional[int] = ...,
    runtime_mappings: Optional[Mapping[str, Any]] = ...,
    aggs: Optional[Mapping[str, Any]] = ...
    ...,
    body: Optional[Any] = ...,
): ...
```

The differences that I want to highlight:

- There is still a `body` parameter for backwards compatibility with queries written before this change. This won't be a breaking change, any queries that are using the `body` parameter will still function exactly as they do today. The `body` parameter will be deprecated on APIs that support expanded body fields.
- Using the [Elasticsearch specification](https://github.com/elastic/elasticsearch-specification) we get better type hints for parameters that already exist. You can see this example best with the `index` and `size` parameters, previously they were `Optional[Any]` and now they are `Optional[Union[str, List[str]]` and `Optional[int]` respectively. This gives you a lot more confidence when writing API calls that you're using the right types the first time.
- Not shown, but the `size` parameter would be encoded in the HTTP request body instead of the query string. This is an improvement over the current serialization strategy because space within the HTTP request target (URL path + query) is limited. In the past we've seen errors from users using Scroll IDs which can be quite verbose and currently have a work-around to serialize Scroll IDs in the HTTP body. Another motivator is that any value serialized in the query string instead of the request body isn't effected by HTTP body compression.

## Nuances and Future Improvements

These improvements are a first-step towards a fully-typed Elasticsearch Python client and will let users start down the path of adding richer types their Elasticsearch code. However there are a few things to note:

### When will `DeprecationWarnings` start?

I'm unsure if they should be emitted in the next release (7.15) or if I should wait until more APIs are supported. Tough balance between alerting users to upcoming changes and new features and being difficult to avoid `DeprecationWarnings` in general usage of the client.

### Not all APIs can take advantage right away

For an API to have expanded body fields it must be completed defined within the [Elasticsearch specification](https://github.com/elastic/elasticsearch-specification). There are a lot of APIs that aren't completely defined yet so rollout of this change may take some time while the specification is filled.

In these cases APIs will be generated using the previous process and have a simple `body` parameter that won't raise a `DeprecationWarning` for the time being.

### When the `body` parameter is used, return to old behavior

This means that all previous code written will use the old behavior until updated to not use the `body` parameter. A `DeprecationWarning` will be raised in cases where the `body` parameter could be replaced by expanded fields.

If both a body field and the `body` parameter are defined a `ValueError` will be raised as this configuration isn't supported.

### Some fields that were once serialized to the query are now serialized to the body

This is true! However I believe that API compatibility won't be broken by this behavior change as Elasticsearch will treat the two different serializations the same. Some examples of this are `_source`, `size`, `from`, etc.

### Deeply nested objects are still untyped

For example the `runtime_mappings` object is typed as `Mapping[str, Any]` where instead it should be mapped as `Mapping[str, TypedDict[type: Union[Mapping[str, Any], str], format: str, script: Optional[str]]]`.

The aim is to reduce the amount of `Any` types in type hints if possible but for this initial implementation using only built-in Python types as this is as far as we can define in the general case. This means only using scalars (`int`, `float`, `str`, etc), and `Union`, `Optional`, `List`, `Mapping`, and `Any`.

[`typing.TypedDict`](https://www.python.org/dev/peps/pep-0589/) is a newly added feature to Python 3.8 but in terms of ability to describe the complex structures of Elasticsearch API is missing a few critical features. Specifically the ability to mark one or more keys as "optional" to include without using the `total=False` parameter which makes **all** keys optional. I'll continue to watch the Python typing space for additional improvements there.

In the future defining our own objects and types may be required to represent these complex types.

### What about conflicts with per-request parameters and body fields?

Parameters like `api_key` are a parameter on every API in order to define different authentication config per request.
We want to continue to support code written this way so for now APIs that have a conflict between per-request parameters
and body fields will continue to use the old behavior of a single `body` parameter.

APIs that have these conflicts will not have expanded body fields for now, there is a future improvement in the works to solve this issue.

### What about bodies that aren't JSON?

These APIs won't be changed and will continue to have a `body` parameter for 7.x.

### The future is keyword-only

In Python 3 [keyword-only arguments](https://www.python.org/dev/peps/pep-3102/) were added which allowed making functions automatically raise a `TypeError` if called without using that argument as a keyword argument. This is a fantastic feature because if makes all code written with a library much more readable and maintainable. It also makes my job as a library maintainer much easier as I no longer have to worry about breaking code wrt. the order of parameters, only that they are there.

Currently type stubs define all parameters as keyword-only (except required path parameters) but because these are stubs and not function signatures there's no enforcement of this unless you opt-in with mypy or another type-checking tool.

Starting in 8.x keyword-only arguments will be used for all parameters so users should switch over to keyword arguments (as has always been recommended) as soon as possible!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expose JSON body fields as typed parameters #1680

Motivation

Proposed Solution

Nuances and Future Improvements

When will `DeprecationWarnings` start?

Not all APIs can take advantage right away

When the `body` parameter is used, return to old behavior

Some fields that were once serialized to the query are now serialized to the body

Deeply nested objects are still untyped

What about conflicts with per-request parameters and body fields?

What about bodies that aren't JSON?

The future is keyword-only

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Expose JSON body fields as typed parameters #1680

Description

Motivation

Proposed Solution

Nuances and Future Improvements

When will DeprecationWarnings start?

Not all APIs can take advantage right away

When the body parameter is used, return to old behavior

Some fields that were once serialized to the query are now serialized to the body

Deeply nested objects are still untyped

What about conflicts with per-request parameters and body fields?

What about bodies that aren't JSON?

The future is keyword-only

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

When will `DeprecationWarnings` start?

When the `body` parameter is used, return to old behavior