Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add timeout to node stats #529

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

trk-bxn
Copy link

@trk-bxn trk-bxn commented Feb 4, 2022

  • Added fix for metric gabs on high load on single node.

With the elasticsearch version 7 and Opensearch version 1 there is a new timeout parameter implement.

Elasticsearch 6 ignores this timeout parameter and this change have no impact.

Tested with elasticsearch 6.8 and opensearch 1.2

Elasticsearch
https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html#cluster-nodes-stats-api-query-params

Opensearch
https://opendistro.github.io/for-elasticsearch-docs/docs/elasticsearch/rest-api-reference/#nodesinfo

resolve #508

Signed-off-by: trk-bxn <[email protected]>
Copy link
Contributor

@sysadmind sysadmind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the httpclient passed into collector.NewNodes already has this set on the client side. I think you could pull it from the http.Client.Timeout field so the signature does not need to change. I am also not sure what the benefit is of the http query string parameter. Can you explain how this would benefit users of this exporter?

@trk-bxn
Copy link
Author

trk-bxn commented Jun 15, 2022

Hi,
the benefit is that if one single node have a high response time e.g. high load or bad network connection. The request from exporter to the cluster return with this fix metrics of the other nodes. Currently the exporter return no data for all nodes in this situation and create a gap in Grafana. With this fix this Gab is only for this single node.

This are two different timeout option. The http.Client.Timeout stop if the request is to long. But in the background the requested node try to get the metrics from the other nodes. This backend request have a elasticsearch internal timeout of default 30 seconds. If now the http.Client timedout 5 secends the requested noded is not ready to send all node metrics and return then an empty response. This is a problem if you have more nodes.

The elasticsearch 7 and opensearch 1 support now to change the default internal timeout for node metrics request and fix the empty response. I defined now to set the elasticsearch internal timeout to the same timeout of the elasticsearch exporter.

@trk-bxn
Copy link
Author

trk-bxn commented Jun 15, 2022

this fix this problem in newer elasticsearch version
#330

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suggestion for timeout method on ES 7+ and opensearch
2 participants