Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[observability][export-api] Env var config to write export events of specific source types #49541

Merged
merged 14 commits into from
Jan 22, 2025

Conversation

nikitavemuri
Copy link
Contributor

@nikitavemuri nikitavemuri commented Jan 2, 2025

Why are these changes needed?

  • Add RAY_enable_export_api_write_config environment variable which can be set to a string containing a comma separated list of source types that export events should be written for. This is only considered if RAY_enable_export_api_write is not set, so this change is backward compatible.
  • This will allow users to select which export events they need to be written based on their use case or performance implications. Eg: A user can list all export events except EXPORT_TASK which we saw impacted task scheduling latency.
  • This feature is in alpha stage currently, so there are no public docs yet. We will send individual updates to users who have expressed interest in this feature.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Nikita Vemuri added 2 commits December 15, 2024 22:21
Signed-off-by: Nikita Vemuri <[email protected]>
Signed-off-by: Nikita Vemuri <[email protected]>
@nikitavemuri nikitavemuri added the go add ONLY when ready to merge, run all tests label Jan 2, 2025
Nikita Vemuri added 6 commits January 2, 2025 09:14
Signed-off-by: Nikita Vemuri <[email protected]>
Signed-off-by: Nikita Vemuri <[email protected]>
Signed-off-by: Nikita Vemuri <[email protected]>
Signed-off-by: Nikita Vemuri <[email protected]>
@nikitavemuri nikitavemuri marked this pull request as ready for review January 3, 2025 18:05
@nikitavemuri nikitavemuri requested a review from a team as a code owner January 3, 2025 18:05
@nikitavemuri nikitavemuri requested a review from alanwguo January 3, 2025 18:06
@@ -534,8 +534,22 @@ def gcs_actor_scheduling_enabled():

RAY_BACKEND_LOG_JSON_ENV_VAR = "RAY_BACKEND_LOG_JSON"

# Write export API event of all resource types to file if enabled.
# RAY_enable_export_api_write_config will not be considered if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should have this flag always be considered. This can be used as a hard shut off if this is not set or set to False.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change wouldn't be backward compatible if RAY_enable_export_api_write_config is always considered, right? I was thinking we can eventually deprecate RAY_enable_export_api_write, which should have the same effect.

Comment on lines 161 to 163
ray_export_api_config_json = json.loads(
ray_constants.RAY_ENABLE_EXPORT_API_WRITE_CONFIG
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to parse this every time? Or just once on ray launch / first time this config is read?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, updated to move the JSON parsing to where the env var is read

Signed-off-by: Nikita Vemuri <[email protected]>
@nikitavemuri nikitavemuri requested a review from alanwguo January 3, 2025 22:59
@@ -509,4 +509,27 @@ void RayEventInit(const std::vector<SourceTypeVariant> source_types,
});
}

bool IsExportAPIEnabledSourceType(std::string source_type,
Copy link
Contributor

@alanwguo alanwguo Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this called once per event or per source?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be once per source now

Signed-off-by: Nikita Vemuri <[email protected]>
@nikitavemuri nikitavemuri requested a review from alanwguo January 13, 2025 20:54
Copy link
Contributor

@alanwguo alanwguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the future we may want to support an RAY_enable_export_api_write_config=all config for convenience. Do we expect to be able to re-use this config for the new obs stack export path?

@nikitavemuri
Copy link
Contributor Author

I think in the future we may want to support an RAY_enable_export_api_write_config=all config for convenience.

Sounds good, we can add this as a followup

Do we expect to be able to re-use this config for the new obs stack export path?

Yes, as long as we are ok with the naming

Signed-off-by: Nikita Vemuri <[email protected]>
"RAY_enable_export_api_write_config", "[]"
)
try:
RAY_ENABLE_EXPORT_API_WRITE_CONFIG = json.loads(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, will RAY_enable_export_api_write_config always be a list of resources or we are thinking about extending it to more complex json objects in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only be a flat list of resource types, so updated to a comma separated string so we can convert to a vector

)
except Exception:
RAY_ENABLE_EXPORT_API_WRITE_CONFIG = None
logger.exception("Error parsing JSON for RAY_enable_export_api_write_config. ")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.exception("Error parsing JSON for RAY_enable_export_api_write_config. ")
logger.exception(f"Error parsing JSON for RAY_enable_export_api_write_config. RAY_enable_export_api_write_config={RAY_ENABLE_EXPORT_API_WRITE_CONFIG_STR}")

// resource types in ExportEvent.SourceType enum in
// src/ray/protobuf/export_api/export_event.proto
// Example config: `export RAY_enable_export_api_write_config='["EXPORT_ACTOR"]'`
RAY_CONFIG(std::string, enable_export_api_write_config, "[]")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the config will always be a list, I'm wondering if it can be converted to a vector here instead of using std::string type. In this case, the conversion will only be done once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, updated to use std::vector<std::string> and have the config be a comma separated string instead of JSON

Nikita Vemuri added 2 commits January 21, 2025 10:34
Signed-off-by: Nikita Vemuri <[email protected]>
@@ -932,10 +932,10 @@ RAY_CONFIG(int, object_manager_rpc_threads_num, 0)
// this is enabled.
RAY_CONFIG(bool, enable_export_api_write, false)

// JSON configuration that lists individual resource types to write
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering is there a particular reason we need a space in the separator of the resource string? If not, for better maintenance of the environment variable, wondering if we can just use , to separate the resources?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can separate the resources with only , (the ConvertValue function already removes extra spaces so either one doesn't require changing the implementation)

@@ -9,7 +9,7 @@
# RAY_enable_export_api_write_config env var must be set before importing
# `ray` so the correct value is set for RAY_ENABLE_EXPORT_API_WRITE_CONFIG
# even outside a Ray driver.
os.environ["RAY_enable_export_api_write_config"] = '["EXPORT_SUBMISSION_JOB"]'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering should we also add python test with multiple resources specified in the config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the test_check_export_api_enabled on line 48 to test multiple resources in the config.

The config on line 12 is needed for test_submission_job_export_events where submission jobs use the env var outside a Ray driver. Probably we won't need to add other resources to this line unless there is another resource that export events outside a driver in the future.

Signed-off-by: Nikita Vemuri <[email protected]>
Copy link
Collaborator

@MengjinYan MengjinYan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! @jjyao can you take a look and help merge the PR? Thanks!!

@jjyao jjyao merged commit 67f9490 into ray-project:master Jan 22, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants