-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Host status test #216
Host status test #216
Conversation
Is is addressing #199 as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run this, it fails on: There is not one cluster which includes node with FQDN
error because the playbook which is messing with tendrl daemons is executed before cluster reuse fixture.
Have you noticed this in your environment?
I guess it would be best to wait for https://gitlab.com/mbukatov/pytest-ansible-playbook/issues/4. Then you will be able to specify setup/teardown playbooks in fixture workload_stop_nodes
, which could help us to make the order more clear.
Edit: it was failing because of this: Tendrl/node-agent#863 I use mixed naming unless testing requires otherwise to catch issues like this ...
usmqe_tests/conftest.py
Outdated
procedure and as `result` is used number of nodes. | ||
""" | ||
# wait for tendrl to notice that nodes are down | ||
time.sleep(240) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a log line, which states what you stated in the comment: wait for tendrl to notice that nodes are down. It would make it more clear what's going on when one checks logs while the test is running.
Also, it's failing for me. Will investigate. Is it expected?
|
Tested on cluster with nodes that have mixed naming. Should be stable now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have 2 problem with this test: logging/reporting needs improvement, and it consistently fails for me:
usmqe/api/graphiteapi/graphiteapi.py:116: AssumptionFailure
Data mean should be 4, data mean in Graphite is: 6.0, applicable divergence is 1
usmqe/api/graphiteapi/graphiteapi.py:116: AssumptionFailure
Data mean should be 0.0, data mean in Graphite is: 2.0, applicable divergence is 1
------------------------------------------------------------
Failed Assumptions: 2, Passed Assumption: 22, Waived Assumption: 0
This happens both on mixed and all fqdn named clusters.
(targets_used[0],), | ||
workload_stop_nodes["start"], | ||
workload_stop_nodes["end"], | ||
divergence=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logging needs to be improved a bit. When this or other checks fails, I see:
[12:42:13,855] [ FAIL ] pytests_test:: usmqe/api/graphiteapi/graphiteapi.py:116: AssumptionFailure
Data mean should be 4, data mean in Graphite is: 6.0, applicable divergence is 1
which doesn't tell me what value I have a problem with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvements in logging. That is crucial for proper debugging. I realized that the failure I noticed previously in my environmnet was caused by missing 2 nodes in particular group of my inventory file.
This test now fails because tendrl takes a lot of time to notice dead nodes.
This test uses playbooks from usmqe/usmqe-setup#224