Skip to content

Commit

Permalink
Kubernetes OOMs appear as non-zero sigkills, adding support for treat…
Browse files Browse the repository at this point in the history
…ing these as OOMs
  • Loading branch information
matt-aitken committed Feb 11, 2025
1 parent bb65b26 commit 036a506
Showing 1 changed file with 12 additions and 0 deletions.
12 changes: 12 additions & 0 deletions apps/webapp/app/v3/services/completeAttempt.server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -696,5 +696,17 @@ function isOOMError(error: TaskRunError) {
return true;
}

// For the purposes of retrying on a larger machine, we're going to treat this is an OOM error.
// This is what they look like if we're executing using k8s. They then get corrected later, but it's too late.
// {"code": "TASK_PROCESS_EXITED_WITH_NON_ZERO_CODE", "type": "INTERNAL_ERROR", "message": "Process exited with code -1 after signal SIGKILL."}
if (
error.code === "TASK_PROCESS_EXITED_WITH_NON_ZERO_CODE" &&
error.message &&
error.message.includes("SIGKILL") &&
error.message.includes("-1")
) {
return true;
}

return false;
}

0 comments on commit 036a506

Please sign in to comment.