Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib; Offline RL] 1. Fix multi-learner issue. #49194

Merged

Conversation

simonsays1980
Copy link
Collaborator

@simonsays1980 simonsays1980 commented Dec 10, 2024

Why are these changes needed?

Multi-learner setups were lately running with errors. The main reason was due to reinstantiation of a streaming split. This PR

  • Fixes the error by providing a single streaming split in the OfflineData.sample.
  • It further renames OfflineData.batch_iterator to OfflineData.batch_iterators.
  • It simplifies the execution logic of the OfflineData.sample method in general.
  • It provides the learner with an additional iterator attribute that gets assigned to when Learner.update_from_iterator is called. This allows the Learner to pause the iterator in between training iterations and to use it again when the next iteration takes turn (note streaming_split iterators are repeatable).
  • It provides **kwargs to Learner.update_from_batches to improve customization and readability. This enables to call the learner_group.update method with kwargs when in multi- OR in single-learner mode.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

…enerated. This produces otherwise an error, if the data is larger than in our sample files. Provided the Learner with an 'iterator' that keeps this iterator and iterates over it.

Signed-off-by: simonsays1980 <[email protected]>
@simonsays1980 simonsays1980 added rllib RLlib related issues rllib-offline-rl Offline RL problems labels Dec 10, 2024
@simonsays1980 simonsays1980 marked this pull request as ready for review December 10, 2024 18:08
@@ -270,6 +270,9 @@ def __init__(
# and return the resulting (reduced) dict.
self.metrics = MetricsLogger()

# TODO (simon): Describe for what we need this and define the type here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Let's add the comment about what the self.iterator property is about ...

@simonsays1980 simonsays1980 changed the title [RLlib; Offline RL] Fix multi-learner issue. [RLlib; Offline RL] 1. Fix multi-learner issue. Dec 11, 2024
@@ -1088,6 +1094,9 @@ def update_from_iterator(
"`num_iters` instead."
)

if not self.iterator:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question: What if self.iterator is already set (to a previously incoming DataIterator)? Would the now-incoming iterator be thrown away?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would. But we want this: The incoming iterator might be different from the one assigned to the learner, but only because of ray.get not being able to call learners in order, i.e. the learners might get different streaming splits every training iteration - we do not want this to happen. One Learner same split.

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now. Just one question for my understanding remaining.

@sven1977 sven1977 enabled auto-merge (squash) December 11, 2024 13:39
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Dec 11, 2024
@sven1977 sven1977 added rllib-newstack and removed go add ONLY when ready to merge, run all tests labels Dec 11, 2024
@sven1977 sven1977 self-assigned this Dec 11, 2024
@github-actions github-actions bot disabled auto-merge December 12, 2024 15:54
@sven1977 sven1977 enabled auto-merge (squash) December 13, 2024 13:01
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Dec 13, 2024
@sven1977 sven1977 merged commit 31b4302 into ray-project:master Dec 13, 2024
7 checks passed
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests rllib RLlib related issues rllib-newstack rllib-offline-rl Offline RL problems
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants