Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux CPU Build fails with can't connect to docker daemon #134

Open
seemethere opened this issue Apr 18, 2024 · 4 comments
Open

Linux CPU Build fails with can't connect to docker daemon #134

seemethere opened this issue Apr 18, 2024 · 4 comments
Labels
workstream/linux-cpu Get CPU jobs working on linux

Comments

@seemethere
Copy link
Member

https://github.com/pytorch/pytorch/actions/runs/8738940413/job/23979092050

Relevant lines:

Cannot connect to the Docker daemon at unix:///run/docker/docker.sock. Is the docker daemon running?
"docker stop" requires at least 1 argument.
See 'docker stop --help'.

Usage:  docker stop [OPTIONS] CONTAINER [CONTAINER...]

Stop one or more running containers
"container prune" requires API version 1.25, but the Docker daemon API version is 1.24
@seemethere
Copy link
Member Author

Have a feeling that this might be related, actions/actions-runner-controller#2999

@zxiiro
Copy link
Collaborator

zxiiro commented Apr 22, 2024

@danibaibak mentioned on Slack that this appears to be intermittent. Retrying the failed jobs could allow them to pass. This is going to be harder to troubleshoot since it doesn't appear to happen on every run.

One theory right now is since dind-rootless mode runs docker daemon in a sidecar container. Perhaps the sidecar is not ready yet when the arc runner starts accepting jobs. I'm not sure how to validate that theory though.

@ZainRizvi
Copy link
Contributor

@zxiiro
Copy link
Collaborator

zxiiro commented Apr 24, 2024

I don't know if this is related but sounds similar actions/actions-runner-controller#3257

@ZainRizvi ZainRizvi added the workstream/linux-cpu Get CPU jobs working on linux label Apr 30, 2024
@ZainRizvi ZainRizvi added this to the ARC Runner Reliability milestone Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
workstream/linux-cpu Get CPU jobs working on linux
Projects
None yet
Development

No branches or pull requests

4 participants