Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Min Runners Scaling comparison delay cause additional runner to be created and terminated unnecessarily #3866

Open
4 tasks done
rekha-prakash-maersk opened this issue Jan 2, 2025 · 1 comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers

Comments

@rekha-prakash-maersk
Copy link

Checks

Controller Version

0.9.3 , 0.10.1

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Set minRunners to 1 (Now we have one runner registered - Runner A)
2. Trigger two jobs simultaneously
3. Now we have two more runners registered ( Runners B & C  online - Total 3)
4. The Runners A and B received the jobs and C is waiting for the job maintaining min runner as 1
5. A complete the job and terminates, now we have new runner D spinning up.
6. Now B and C also gets terminated, and D remains..

Here C is unnecessarily terminated because of the latency in the scaling calculation. it was not executing any job, it was on stand by as per minRunners setup.

Describe the bug

when we set up minRunners, the additional runner which is in standby also gets terminated without even executing any job due to delay in scaling comparison of desired runners

Describe the expected behavior

In this case, the Runner C should remain active and Runner D should not have get created.

Additional Context

minRunners:1
maxRunners:150

Controller Logs

START OF THE 2 JOBS:

EphemeralRunnerSet    Scaling comparison    {"version": "0.10.1", "ephemeralrunnerset": {"name":"sh-ubuntu-latest-np-l","namespace":"arc-runners"}, "current": 3, "desired": 3}  │


AFTER ONE JOB COMPLETED, RUNNEr A TERMINATED (note , desired still shows 3)


EphemeralRunnerSet    Scaling comparison    {"version": "0.10.1", "ephemeralrunnerset": {"name":"sh-ubuntu-latest-np-l","namespace":"arc-runners"}, "current": 2, "desired": 3}   │

NOW RUNNER D GET CREATED(UNNECESSARY RUNNER), 

EphemeralRunnerSet    Scaling comparison    {"version": "0.10.1", "ephemeralrunnerset": {"name":"sh-ubuntu-latest-np-l","namespace":"arc-runners"}, "current": 3, "desired": 2}   │

NOW ANOTHER JOB COMPLETED, ONLY RUNNER C and D ARE ACTIVE, STILL DELAY IN UPDATING DESIRED

EphemeralRunnerSet    Scaling comparison    {"version": "0.10.1", "ephemeralrunnerset": {"name":"sh-ubuntu-latest-np-l","namespace":"arc-runners"}, "current": 2, "desired": 2}    │


NOW WE HAVE RUNNER C and D BOTH ARE WAITING FOR NEW JOB

 EphemeralRunnerSet    Scaling comparison    {"version": "0.10.1", "ephemeralrunnerset": {"name":"sh-ubuntu-latest-np-l","namespace":"arc-runners"}, "current": 2, "desired": 1} │


NOW RUNNER C GOT DELETED 

 EphemeralRunnerSet    Scaling comparison    {"version": "0.10.1", "ephemeralrunnerset": {"name":"sh-ubuntu-latest-np-l","namespace":"arc-runners"}, "current": 1, "desired": 1} │

Runner Pod Logs

LOGS OF RUNNER C waiting for job and gets terminated due to scaling comparison


[RUNNER 2025-01-02 16:42:01Z INFO Terminal] WRITE LINE: 

√ Connected to GitHub
[RUNNER 2025-01-02 16:42:01Z INFO Terminal] WRITE LINE: 

[RUNNER 2025-01-02 16:42:01Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/runner/.credentials_rsaparams
[RUNNER 2025-01-02 16:42:01Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/runner/.credentials_rsaparams
[RUNNER 2025-01-02 16:42:01Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[RUNNER 2025-01-02 16:42:02Z INFO MessageListener] Session created.
[RUNNER 2025-01-02 16:42:02Z INFO Terminal] WRITE LINE: Current runner version: '2.321.0'
Current runner version: '2.321.0'
[RUNNER 2025-01-02 16:42:02Z INFO Terminal] WRITE LINE: 2025-01-02 16:42:02Z: Listening for Jobs
2025-01-02 16:42:02Z: Listening for Jobs
[RUNNER 2025-01-02 16:42:02Z INFO JobDispatcher] Set runner/worker IPC timeout to 30 seconds.
[RUNNER 2025-01-02 16:42:02Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/runner/.credentials_rsaparams
[RUNNER 2025-01-02 16:42:02Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
time="2025-01-02T16:53:24.543067339Z" level=info msg="Processing signal 'terminated'"
time="2025-01-02T16:53:24.544010095Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
time="2025-01-02T16:53:24.544396972Z" level=info msg="Daemon shutdown complete"
time="2025-01-02T16:53:24.544453030Z" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd
time="2025-01-02T16:53:24.544463570Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
Exiting runner...
[RUNNER 2025-01-02 16:53:24Z INFO Terminal] WRITE LINE: Exiting...
Exiting...
[RUNNER 2025-01-02 16:53:24Z INFO ConfigurationStore] IsServiceConfigured()
[RUNNER 2025-01-02 16:53:24Z INFO ConfigurationStore] IsServiceConfigured: False
[RUNNER 2025-01-02 16:53:24Z INFO HostContext] Runner will be shutdown for UserCancelled
[RUNNER 2025-01-02 16:53:24Z WARN GitHubActionsService] GET request to https://broker.actions.githubusercontent.com/message?sessionId=ee4b09ac-0196-4a07-b068-f631314baf4b&status=Online&runnerVersion=2.321.0&os=Linux&architecture=X64&disableUpdate=true has been cancelled.
[RUNNER 2025-01-02 16:53:24Z ERR  BrokerServer] Catch exception during request
[RUNNER 2025-01-02 16:53:24Z ERR  BrokerServer] System.Threading.Tasks.TaskCanceledException: The operation was canceled.
[RUNNER 2025-01-02 16:53:24Z ERR  BrokerServer]  ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
[RUNNER 2025-01-02 16:53:24Z ERR  BrokerServer]  ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled.
[RUNNER 2025-01-02 16:53:24Z ERR  BrokerServer]  ---> System.Net.Sockets.SocketException (125): Operation canceled
[RUNNER 2025-01-02 16:53:24Z ERR  BrokerServer]    --- End of inner exception stack trace ---
[RUNNER 2025-01-02 16:53:24Z ERR  BrokerServer]    at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
[RUNNER 2025-01-02 16:53:24Z ERR  BrokerServer]    at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource<System.Int32>.GetResult(Int16 token)
[RUNNER 2025-01-02 16:53:24Z ERR  BrokerServer]    at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](CancellationToken cancellationToken, Int32 estimatedSize)
[RUNNER 2025-01-02 16:53:24Z ERR  BrokerServer]    at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
[RUNNER 2025-01-02 16:53:24Z ERR  BrokerServer]    at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](Memory`1 buffer, CancellationToken cancellationToken)
[RUNNER 2025-01-02 16:53:24Z ERR  BrokerServer]    at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
[RUNNER 2025-01-02 16:53:24Z ERR  BrokerServer]    at System.Net.Http.HttpConnection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
@rekha-prakash-maersk rekha-prakash-maersk added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Jan 2, 2025
Copy link
Contributor

github-actions bot commented Jan 2, 2025

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers
Projects
None yet
Development

No branches or pull requests

1 participant