-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[observability][export-api] Env var config to write export events of specific source types #49541
[observability][export-api] Env var config to write export events of specific source types #49541
Conversation
Signed-off-by: Nikita Vemuri <[email protected]>
Signed-off-by: Nikita Vemuri <[email protected]>
Signed-off-by: Nikita Vemuri <[email protected]>
…ita/export_api_enable_config
Signed-off-by: Nikita Vemuri <[email protected]>
Signed-off-by: Nikita Vemuri <[email protected]>
…ita/export_api_enable_config
Signed-off-by: Nikita Vemuri <[email protected]>
@@ -534,8 +534,22 @@ def gcs_actor_scheduling_enabled(): | |||
|
|||
RAY_BACKEND_LOG_JSON_ENV_VAR = "RAY_BACKEND_LOG_JSON" | |||
|
|||
# Write export API event of all resource types to file if enabled. | |||
# RAY_enable_export_api_write_config will not be considered if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should have this flag always be considered. This can be used as a hard shut off if this is not set or set to False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change wouldn't be backward compatible if RAY_enable_export_api_write_config
is always considered, right? I was thinking we can eventually deprecate RAY_enable_export_api_write
, which should have the same effect.
ray_export_api_config_json = json.loads( | ||
ray_constants.RAY_ENABLE_EXPORT_API_WRITE_CONFIG | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to parse this every time? Or just once on ray launch / first time this config is read?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, updated to move the JSON parsing to where the env var is read
Signed-off-by: Nikita Vemuri <[email protected]>
src/ray/util/event.cc
Outdated
@@ -509,4 +509,27 @@ void RayEventInit(const std::vector<SourceTypeVariant> source_types, | |||
}); | |||
} | |||
|
|||
bool IsExportAPIEnabledSourceType(std::string source_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this called once per event or per source?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be once per source now
Signed-off-by: Nikita Vemuri <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the future we may want to support an RAY_enable_export_api_write_config=all
config for convenience. Do we expect to be able to re-use this config for the new obs stack export path?
Sounds good, we can add this as a followup
Yes, as long as we are ok with the naming |
Signed-off-by: Nikita Vemuri <[email protected]>
python/ray/_private/ray_constants.py
Outdated
"RAY_enable_export_api_write_config", "[]" | ||
) | ||
try: | ||
RAY_ENABLE_EXPORT_API_WRITE_CONFIG = json.loads( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, will RAY_enable_export_api_write_config
always be a list of resources or we are thinking about extending it to more complex json objects in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should only be a flat list of resource types, so updated to a comma separated string so we can convert to a vector
python/ray/_private/ray_constants.py
Outdated
) | ||
except Exception: | ||
RAY_ENABLE_EXPORT_API_WRITE_CONFIG = None | ||
logger.exception("Error parsing JSON for RAY_enable_export_api_write_config. ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.exception("Error parsing JSON for RAY_enable_export_api_write_config. ") | |
logger.exception(f"Error parsing JSON for RAY_enable_export_api_write_config. RAY_enable_export_api_write_config={RAY_ENABLE_EXPORT_API_WRITE_CONFIG_STR}") |
src/ray/common/ray_config_def.h
Outdated
// resource types in ExportEvent.SourceType enum in | ||
// src/ray/protobuf/export_api/export_event.proto | ||
// Example config: `export RAY_enable_export_api_write_config='["EXPORT_ACTOR"]'` | ||
RAY_CONFIG(std::string, enable_export_api_write_config, "[]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the config will always be a list, I'm wondering if it can be converted to a vector here instead of using std::string
type. In this case, the conversion will only be done once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, updated to use std::vector<std::string>
and have the config be a comma separated string instead of JSON
Signed-off-by: Nikita Vemuri <[email protected]>
Signed-off-by: Nikita Vemuri <[email protected]>
src/ray/common/ray_config_def.h
Outdated
@@ -932,10 +932,10 @@ RAY_CONFIG(int, object_manager_rpc_threads_num, 0) | |||
// this is enabled. | |||
RAY_CONFIG(bool, enable_export_api_write, false) | |||
|
|||
// JSON configuration that lists individual resource types to write |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering is there a particular reason we need a space in the separator of the resource string? If not, for better maintenance of the environment variable, wondering if we can just use ,
to separate the resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can separate the resources with only ,
(the ConvertValue
function already removes extra spaces so either one doesn't require changing the implementation)
@@ -9,7 +9,7 @@ | |||
# RAY_enable_export_api_write_config env var must be set before importing | |||
# `ray` so the correct value is set for RAY_ENABLE_EXPORT_API_WRITE_CONFIG | |||
# even outside a Ray driver. | |||
os.environ["RAY_enable_export_api_write_config"] = '["EXPORT_SUBMISSION_JOB"]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering should we also add python test with multiple resources specified in the config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the test_check_export_api_enabled
on line 48 to test multiple resources in the config.
The config on line 12 is needed for test_submission_job_export_events
where submission jobs use the env var outside a Ray driver. Probably we won't need to add other resources to this line unless there is another resource that export events outside a driver in the future.
Signed-off-by: Nikita Vemuri <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! @jjyao can you take a look and help merge the PR? Thanks!!
Why are these changes needed?
RAY_enable_export_api_write_config
environment variable which can be set to a string containing a comma separated list of source types that export events should be written for. This is only considered ifRAY_enable_export_api_write
is not set, so this change is backward compatible.EXPORT_TASK
which we saw impacted task scheduling latency.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.