-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: improve test startup logic #89
Conversation
Hi @victorlcm, thank you for catching this case and fixing it! 👏 Just as a heads-up, I'll be able to fully test the logic of this PR only on the next week. But at first glance, it looks fine to me 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@victorlcm I've been looking at this PR more closely today and I wish to clarify with you the following: in your use case, do you run it with separate
flag in yaml configuration?
So far, all seems good, though there is one more thing I request you to adjust, please see the comment 🙂 I'll check a few more things in the meantime.
Co-authored-by: yorugac <[email protected]>
Hey @yorugac, we're running with separate |
Hi @victorlcm, thanks for the fix but I'm afraid that's not all yet 😂 I've found one more issue in current approach and we really should try to fix the logic there. Let me explain. Operator uses k6's REST API for starting runners but it does it with hostnames, not IPs. Hostnames come from: k6-operator/controllers/k6_start.go Lines 59 to 61 in d268e0e
In contrast, your PR relies on IP of the runner. This approach is incorrect for the cases when there is more than 1 runner on the same node. I assume you haven't encountered such an error during your usage but it can happen otherwise. Basically the check for Then the checks become separate:
Does it makes sense? Could you please try this updated logic with your use case, to see if it continues to work? |
Hey @yorugac, I'm sorry for the delay, this is a busy week for me. Your point makes sense! I'll make the changes later this week or early next week, is that ok? Thanks for the great review! |
@victorlcm NP, that'll be great! Looking forward to your update 🙂 |
hey @yorugac, I've changed to use the service approach, it seems to be working fine now, can you test it again? Tks! 😄 |
@yorugac the way this PR is, if this get approved, with more changes this can also remove the starter pod and add the logic on the operator to make a curl on the runners? |
Hi @knmsk, good question. I think what you described is an option to move away from Curl completely (#87) but it'd result in removing a sync point from logic. I.e. right now, starter uses basically one command and acts as a synchronization point and removing it would create potential delays in when runners actually start. AFAIK, some setups might depend on starter having a separate security requirements (e.g. see #74). These points might be negligible in most cases or they might depend on exact implementation but I believe implications should be thought through before going that route. |
This makes a lot of sense, i will create a separated issue to discuss this topic :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @victorlcm, thank you for the update! I've tested it yesterday and I think it's good to go now 👍 😄
There is one thing about domains that I wonder about: starters right now use a short form with http://${service.ObjectMeta.Name}:6565
🤔 But in this PR's check namespace is likely needed indeed... ATM, I'm not aware if this can fail anywhere but we should probably make a consistent scheme for DNS sometime.
Thanks again for all your work!
Hello everyone,
We've been using this operator for a while and we observed a strange behavior on tests that took a long time on init phase. We noticed that they remained in "paused state" because the k6 http server was not ready yet. So, we've added a simple verification to make sure the http server is up before sending the start requests to the jobs. It's a simple fix, but the results were very effective.