[RLlib; docs] Docs do-over (new API stack): Env pages vol 01. #49165

sven1977 · 2024-12-09T12:57:19Z

[RLlib] Docs do-over (new API stack): Env pages vol 01

new sigils/logos for upcoming split-structure (single-, multi-agent, external & hierarchical)
more example scripts for simultaneous and turn-based multi-agent acting patterns (added to CI)
remove old API stack env APIs no longer needed
Move example multi-agent env classes into their own subfolder in examples/envs/classes/multi_agent/..

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>

sven1977 · 2024-12-09T13:03:15Z

rllib/examples/envs/classes/multi_agent/__init__.py

@@ -0,0 +1,20 @@
+from ray.rllib.env.multi_agent_env import make_multi_agent


Simply moved some example classes in here for order.

simonsays1980

LGTM. Awesome PR!

simonsays1980 · 2024-12-09T13:03:30Z

doc/source/rllib/images/envs/env_runners.svg

In the long run we should ask for a professional designer to make these diagrams.

simonsays1980 · 2024-12-09T13:04:23Z

doc/source/rllib/images/envs/hierarchical_env_logo.svg

So, does this wants to say that the top agent is acting whenever the lower ones are not or could this happen simultaneously?

Both is possible. Our example script has always only one level acting at a time.

simonsays1980 · 2024-12-09T13:13:59Z

rllib/examples/envs/classes/multi_agent/rock_paper_scissors.py

+    """Two-player environment for the famous rock paper scissors game.
+
+    # __sphinx_doc_1_end__
+    Optionally, the "Sheldon Cooper extension" can be activated by passing


Hilarious! :D

simonsays1980 · 2024-12-09T13:18:48Z

rllib/examples/envs/classes/multi_agent/rock_paper_scissors.py

+
+        # The observations are always the last taken actions. Hence observation- and
+        # action spaces are identical.
+        self.observation_spaces = self.action_spaces = {


Maybe simplify to:

self.sheldon_cooper_mode = self.config.get("sheldon_cooper_mode", False) if self.sheldon_cooper_mode: num_actions = 5 else: num_actions = 3 self.action_spaces = self.observation_spaces = { "player1": gym.spaces.Discrete(num_actions), "player2": gym.spaces.Discrete(num_actions), }

Sure, but I wanted to leave the Sheldon Cooper mode out of the docs entirely (to keep docs as simple as possible). Therefore I had to spacially separate these two logics entirely in the file.

simonsays1980 · 2024-12-09T13:20:12Z

rllib/examples/envs/classes/multi_agent/tic_tac_toe.py

+    | 6| 7| 8|
+    ----------
+
+    The action space is Discrete(9) and actions landing on an alredy occupied field


"alredy" -> "already"

simonsays1980 · 2024-12-09T13:24:01Z

rllib/examples/envs/classes/multi_agent/tic_tac_toe.py

+                win_val = [-1, -1, -1]
+            if (
+                # Horizontal win.
+                self.board[:3] == win_val


simonsays1980 · 2024-12-09T13:25:04Z

rllib/examples/envs/classes/multi_agent/tic_tac_toe.py

+            ):
+                # Final reward is +5 for victory and -5 for a loss.
+                rewards[self.current_player] += 5.0
+                rewards[opponent] = -5.0


I wonder if it works better if win and loss are rewarded in a different amount than a wrong placement?

They are rewarded separately, with +1.0 and -1.0.
The misplacement penalty should be learnt pretty quickly by the agents (b/c it hurts a lot) and after that, they should be able to "focus" on the actual game, not misplacing any pieces anymore. 🤞

simonsays1980 · 2024-12-09T13:27:02Z

rllib/examples/envs/classes/multi_agent/tic_tac_toe.py

+
+        return (
+            {self.current_player: np.array(self.board, np.float32)},
+            rewards,


Maybe add a comment here that tells users how these rewards are handled in the MultiAgentEpisode - that it is treated in there as the reward for the last current player (the one that sent the action); this is counter-intuitive at first for new users.

…_redo_cleanup_old_api_stack_01_00

Signed-off-by: sven1977 <[email protected]>

…oject#49165)

…oject#49165) Signed-off-by: ujjawal-khare <[email protected]>

wip

61cac4a

Signed-off-by: sven1977 <[email protected]>

sven1977 requested review from maxpumperla, simonsays1980 and a team as code owners December 9, 2024 12:57

sven1977 commented Dec 9, 2024

View reviewed changes

simonsays1980 approved these changes Dec 9, 2024

View reviewed changes

sven1977 added 3 commits December 11, 2024 11:39

Merge branch 'master' of https://github.com/ray-project/ray into docs…

30bd0a3

…_redo_cleanup_old_api_stack_01_00

wip

61ea883

Signed-off-by: sven1977 <[email protected]>

wip

7b1830d

Signed-off-by: sven1977 <[email protected]>

sven1977 enabled auto-merge (squash) December 11, 2024 10:50

github-actions bot added the go add ONLY when ready to merge, run all tests label Dec 11, 2024

github-actions bot disabled auto-merge December 11, 2024 10:50

sven1977 added rllib RLlib related issues rllib-env rllib env related issues rllib-docs-or-examples Issues related to RLlib documentation or rllib/examples rllib-newstack rllib-oldstack-cleanup Issues related to cleaning up classes, utilities on the old API stack labels Dec 11, 2024

sven1977 assigned simonsays1980 Dec 11, 2024

sven1977 enabled auto-merge (squash) December 11, 2024 10:51

sven1977 changed the title ~~[RLlib] Docs do-over (new API stack): Env pages vol 01.~~ [RLlib; docs] Docs do-over (new API stack): Env pages vol 01. Dec 11, 2024

wip

e052955

Signed-off-by: sven1977 <[email protected]>

github-actions bot disabled auto-merge December 11, 2024 11:46

wip

4771cf7

Signed-off-by: sven1977 <[email protected]>

sven1977 enabled auto-merge (squash) December 11, 2024 13:46

wip

e0eaa3d

Signed-off-by: sven1977 <[email protected]>

github-actions bot disabled auto-merge December 11, 2024 13:48

wip

61bdea0

Signed-off-by: sven1977 <[email protected]>

sven1977 enabled auto-merge (squash) December 11, 2024 15:30

sven1977 merged commit 3f429c6 into ray-project:master Dec 11, 2024
6 checks passed

sven1977 deleted the docs_redo_cleanup_old_api_stack_01_00 branch December 11, 2024 16:32

simonsays1980 pushed a commit to simonsays1980/ray that referenced this pull request Dec 12, 2024

[RLlib; docs] Docs do-over (new API stack): Env pages vol 01. (ray-pr…

e8d3749

…oject#49165)

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Dec 17, 2024

[RLlib; docs] Docs do-over (new API stack): Env pages vol 01. (ray-pr…

79b620a

…oject#49165) Signed-off-by: ujjawal-khare <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib; docs] Docs do-over (new API stack): Env pages vol 01. #49165

[RLlib; docs] Docs do-over (new API stack): Env pages vol 01. #49165

sven1977 commented Dec 9, 2024 •

edited

Loading

sven1977 Dec 9, 2024

simonsays1980 left a comment

simonsays1980 Dec 9, 2024

simonsays1980 Dec 9, 2024

sven1977 Dec 9, 2024

simonsays1980 Dec 9, 2024

simonsays1980 Dec 9, 2024

sven1977 Dec 11, 2024

simonsays1980 Dec 9, 2024

sven1977 Dec 11, 2024

simonsays1980 Dec 9, 2024

simonsays1980 Dec 9, 2024

sven1977 Dec 11, 2024

simonsays1980 Dec 9, 2024

		@@ -0,0 +1,20 @@
		from ray.rllib.env.multi_agent_env import make_multi_agent

[RLlib; docs] Docs do-over (new API stack): Env pages vol 01. #49165

[RLlib; docs] Docs do-over (new API stack): Env pages vol 01. #49165

Conversation

sven1977 commented Dec 9, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

simonsays1980 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Dec 9, 2024 •

edited

Loading