-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent RPC Failed while cloning from github.com #111
Comments
I tried adding this code to
I think adding Ref: https://stackoverflow.com/questions/50305112/pip-install-timeout-issue |
Unfortunately the timeout change didn't seem to help. I'm still seeing the RCP failure during pip install. |
that would be very strange if it only happens on ARC, as I can't see why changing the runner would make git clone to fail... |
I do feel like this is something related to some kind of git timeout while cloning the repo. If you look at the timestamp it's always around 35ish seconds (not sure if that 12.25 ... 47.55 is seconds or what unit) but it's been around that much time in all the instances of the failure I've seen. Maybe setting the timeout for pip install isn't enough.
|
I suspect this likely doesn't only happen in ARC. But we likely see it happening more often in ARC because the ARC builds are not cached like the non-arc builds. The calculate-docker action seems to run every time in ARC (which is yet another issue with ARC). So I think this is a worthwhile problem to figure out that's lower priority even for non-arc jobs as it will increase stability of the build. |
I have this open issue, so there are more external dependencies problems: pytorch/pytorch#124825 |
I'm seeing intermittent failures to clone the packages triton and ucx from github.com when running tests in ARC. I'm not sure if it affects non-arc runners as I haven't done any work on that side but these intermittent failures are fairly frequent and quite annoying as re-running the job a few times will evetually get them to resolve but it would be good if there was a solution that didn't require so many re-runs.
The text was updated successfully, but these errors were encountered: