-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds Health gRPC Server and Refactors Main() #148
Conversation
/lgtm |
pkg/ext-proc/backend/datastore.go
Outdated
} | ||
ready = true | ||
return false | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At startup, I think we want to ensure that the extension did a sync with the api server and fetched the models, but not declare itself ready only if at least one model is defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The health probe now uses a client to check the API server for the configured InferencePool and that at least one InferenceModel exists in the same namespace. Should this probe also check that at least one InferenceModel references the configured InferencePool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that the health check needs to block on at least one InferenceModel. On the other hand, since extension is currently 1:1 with InferencePool, I think it makes sense to ensure that the extension successfully initialized the assigned InferencePool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to ensure that the extension successfully initialized the assigned InferencePool.
^ is the approach I took in the initial PR, e.g. check if InferencePool is nil
in the data store. It also checked if at least 1 InferenceModel that referenced the configured InferencePool was stored but that can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, I missed that. I agree with the original approach! I would still remove the check on InferenceModel.
- Introduced a health gRPC server to handle liveness and readiness probes. - Refactored main() to manage server goroutines. - Added graceful shutdown for servers and controller manager. - Improved logging consistency and ensured. - Validates CLI flags. Signed-off-by: Daneyon Hansen <[email protected]>
Signed-off-by: Daneyon Hansen <[email protected]>
@@ -124,7 +106,7 @@ func main() { | |||
}, | |||
Record: mgr.GetEventRecorderFor("InferencePool"), | |||
}).SetupWithManager(mgr); err != nil { | |||
klog.Error(err, "Error setting up InferencePoolReconciler") | |||
klog.Fatalf("Failed setting up InferencePoolReconciler: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the switch to Fatalf
in several places where an error is critical to the extension's startup.
/lgtm Thanks! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, danehans The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Add health gRPC server and refactors main() - Introduced a health gRPC server to handle liveness and readiness probes. - Refactored main() to manage server goroutines. - Added graceful shutdown for servers and controller manager. - Improved logging consistency and ensured. - Validates CLI flags. Signed-off-by: Daneyon Hansen <[email protected]> * Refactors health server to use data store Signed-off-by: Daneyon Hansen <[email protected]> --------- Signed-off-by: Daneyon Hansen <[email protected]>
Adds a health gRPC Server and refactors
main()
for better lifecycle management:Fixes #96
Fixes #175