Skip to content

custom_token_chars in edge_ngram tokenizer throws transport exception #7991

Open
@michael-celani

Description

@michael-celani

Elastic.Clients.Elasticsearch version: 8.11.0

Elasticsearch version: 8.8.2

Operating system version: Windows 10

Description of the problem including expected versus actual behavior:
Elasticsearch-net can't deserialize the custom_token_chars field in an edge_ngram tokenizer because the field is defined as a string? when it is in fact an array of strings.

{"The JSON value could not be converted to System.String. Path: $.custom_token_chars | LineNumber: 0 | BytePositionInLine: 65."}

"settings": {
        "index": {
          "max_ngram_diff": "30",
          "routing": {
            "allocation": {
              "include": {
                "_tier_preference": "data_content"
              }
            }
          },
          "refresh_interval": "45s",
          "number_of_shards": "1",
          "provided_name": "index",
          "creation_date": "1700171133856",
          "analysis": {
            "analyzer": {
              "autocomplete": {
                "filter": [
                  "trim",
                  "lowercase"
                ],
                "tokenizer": "autocomplete"
              },
              "autocomplete_search": {
                "filter": [
                  "trim",
                  "lowercase"
                ],
                "tokenizer": "keyword"
              },
              "common_analyzer": {
                "filter": [
                  "trim",
                  "lowercase"
                ],
                "type": "custom",
                "tokenizer": "keyword"
              }
            },
            "tokenizer": {
              "autocomplete": {
                "token_chars": [
                  "letter",
                  "digit",
                  "custom"
                ],
                "custom_token_chars": [
                  "()[]-&+"
                ],
                "min_gram": "1",
                "type": "edge_ngram",
                "max_gram": "20"
              }
            }
          },
          "number_of_replicas": "1",
          "uuid": "uuid",
          "version": {
            "created": "8080299"
          }
        }
      }`

Steps to reproduce:

  1. Attempt to get the IndexState for an index with an edge_ngram tokenizer.

Expected behavior
It deserializes properly.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions