Description
Hi,
We have a cluster setup with 9 nodes out of which there are 3 query nodes. During few GET _alias/aliasname call , we see this exception
A non-generic exception was through from the Elastic Search NEST client. ---> Elasticsearch.Net.Exceptions.MaxRetryException: Failed after retrying 2 times: 'GET _alias/codesearch_levion'
...
nnerException: WebException, InnerMessage: The request was aborted: The operation has timed out., InnerStackTrace: at System.Net.HttpWebRequest.GetResponse()
at Elasticsearch.Net.Connection.HttpConnection.DoSynchronousRequest(HttpWebRequest request, Byte[] data, IRequestConfiguration requestSpecificConfig) in D:\GitRepos\elasticsearch-net\src\Elasticsearch.Net\Connection\HttpConnection.cs:line 266
InnerException: WebException, InnerMessage: The request was aborted: The operation has timed out., InnerStackTrace: at System.Net.HttpWebRequest.GetResponse()
at Elasticsearch.Net.Connection.HttpConnection.DoSynchronousRequest(HttpWebRequest request, Byte[] data, IRequestConfiguration requestSpecificConfig) in D:\GitRepos\elasticsearch-net\src\Elasticsearch.Net\Connection\HttpConnection.cs:line 266
InnerException: WebException, InnerMessage: Unable to connect to the remote server, InnerStackTrace: at System.Net.HttpWebRequest.GetResponse()
at Elasticsearch.Net.Connection.HttpConnection.DoSynchronousRequest(HttpWebRequest request, Byte[] data, IRequestConfiguration requestSpecificConfig) in D:\GitRepos\elasticsearch-net\src\Elasticsearch.Net\Connection\HttpConnection.cs:line 266 ---> System.AggregateException: One or more errors occurred. ---> System.Net.WebException: The request was aborted: The operation has timed out.
at System.Net.HttpWebRequest.GetResponse()
at Elasticsearch.Net.Connection.HttpConnection.DoSynchronousRequest(HttpWebRequest request, Byte[] data, IRequestConfiguration requestSpecificConfig) in D:\GitRepos\elasticsearch-net\src\Elasticsearch.Net\Connection\HttpConnection.cs:line 266
...
Could someone please clarify the following questions
(1) Will the NEST retry mechanism take care of re-directing call to a different query node when it has failed 3 times on a particular query node ?
(2) What metrics will generally indicate why a query node was unreachable at a point in time ? Where/how can we get this info ?
Thanks, Divya