You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- **`task_type` (Enum("chat_completion"))**: The type of the inference task that the model will perform.
7793
-
NOTE: The `chat_completion` task type only supports streaming and only through the _stream API.
7794
-
- **`eis_inference_id` (string)**: The unique identifier of the inference endpoint.
7795
-
- **`service` (Enum("elastic"))**: The type of service supported for the specified task type. In this case, `elastic`.
7796
-
- **`service_settings` ({ model_id, rate_limit })**: Settings used to install the inference model. These settings are specific to the `elastic` service.
returnawaitthis.transport.request({ path, method, querystring, body, meta },options)
631
-
}
632
-
633
566
/**
634
567
* Create an inference endpoint. When you create an inference endpoint, the associated machine learning model is automatically deployed if it is not already running. After creating the endpoint, wait for the model deployment to complete before using it. To verify the deployment status, use the get trained model statistics API. Look for `"state": "fully_allocated"` in the response and ensure that the `"allocation_count"` matches the `"target_allocation_count"`. Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources. IMPORTANT: The inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Mistral, Azure OpenAI, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs.
635
568
* @see {@link https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put | Elasticsearch API documentation}
@@ -1033,64 +966,6 @@ export default class Inference {
1033
966
returnawaitthis.transport.request({ path, method, querystring, body, meta },options)
1034
967
}
1035
968
1036
-
/**
1037
-
* Create an Elastic Inference Service (EIS) inference endpoint. Create an inference endpoint to perform an inference task through the Elastic Inference Service (EIS).
1038
-
* @see {@link https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-eis | Elasticsearch API documentation}
returnawaitthis.transport.request({ path, method, querystring, body, meta },options)
1092
-
}
1093
-
1094
969
/**
1095
970
* Create an Elasticsearch inference endpoint. Create an inference endpoint to perform an inference task with the `elasticsearch` service. > info > Your Elasticsearch deployment contains preconfigured ELSER and E5 inference endpoints, you only need to create the enpoints using the API if you want to customize the settings. If you use the ELSER or the E5 model through the `elasticsearch` service, the API request will automatically download and deploy the model if it isn't downloaded yet. > info > You might see a 502 bad gateway error in the response when using the Kibana Console. This error usually just reflects a timeout, while the model downloads in the background. You can check the download progress in the Machine Learning UI. If using the Python client, you can set the timeout parameter to a higher value. After creating the endpoint, wait for the model deployment to complete before using it. To verify the deployment status, use the get trained model statistics API. Look for `"state": "fully_allocated"` in the response and ensure that the `"allocation_count"` matches the `"target_allocation_count"`. Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources.
1096
971
* @see {@link https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-elasticsearch | Elasticsearch API documentation}
/** Can be one of immediate, urgent, high, normal, low, languid. Wait until all currently queued events with the given priority are processed. */
15567
15569
wait_for_events?: WaitForEvents
15568
15570
/** The request waits until the specified number N of nodes is available. It also accepts >=N, <=N, >N and <N. Alternatively, it is possible to use ge(N), le(N), gt(N) and lt(N) notation. */
15569
-
wait_for_nodes?: string | integer
15571
+
wait_for_nodes?: ClusterHealthWaitForNodes
15570
15572
/** A boolean value which controls whether to wait (until the timeout provided) for the cluster to have no shard initializations. Defaults to false, which means it will not wait for initializing shards. */
15571
15573
wait_for_no_initializing_shards?: boolean
15572
15574
/** A boolean value which controls whether to wait (until the timeout provided) for the cluster to have no shard relocations. Defaults to false, which means it will not wait for relocating shards. */
/** The database filename referring to a database the module ships with (GeoLite2-City.mmdb, GeoLite2-Country.mmdb, or GeoLite2-ASN.mmdb) or a custom database in the ingest-geoip config directory. */
0 commit comments