-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib; Offline RL] 1. Fix multi-learner issue. #49194
[RLlib; Offline RL] 1. Fix multi-learner issue. #49194
Conversation
…enerated. This produces otherwise an error, if the data is larger than in our sample files. Provided the Learner with an 'iterator' that keeps this iterator and iterates over it. Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
rllib/core/learner/learner.py
Outdated
@@ -270,6 +270,9 @@ def __init__( | |||
# and return the resulting (reduced) dict. | |||
self.metrics = MetricsLogger() | |||
|
|||
# TODO (simon): Describe for what we need this and define the type here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Let's add the comment about what the self.iterator
property is about ...
Signed-off-by: simonsays1980 <[email protected]>
@@ -1088,6 +1094,9 @@ def update_from_iterator( | |||
"`num_iters` instead." | |||
) | |||
|
|||
if not self.iterator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb question: What if self.iterator
is already set (to a previously incoming DataIterator
)? Would the now-incoming iterator
be thrown away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would. But we want this: The incoming iterator might be different from the one assigned to the learner, but only because of ray.get
not being able to call learners in order, i.e. the learners might get different streaming splits every training iteration - we do not want this to happen. One Learner same split.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now. Just one question for my understanding remaining.
Signed-off-by: ujjawal-khare <[email protected]>
Why are these changes needed?
Multi-learner setups were lately running with errors. The main reason was due to reinstantiation of a streaming split. This PR
OfflineData.sample
.OfflineData.batch_iterator
toOfflineData.batch_iterators
.OfflineData.sample
method in general.learner
with an additionaliterator
attribute that gets assigned to whenLearner.update_from_iterator
is called. This allows the Learner to pause the iterator in between training iterations and to use it again when the next iteration takes turn (notestreaming_split
iterators are repeatable).**kwargs
toLearner.update_from_batches
to improve customization and readability. This enables to call thelearner_group.update
method withkwargs
when in multi- OR in single-learner mode.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.