Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(GRPOTrainer): reward_func return None to skip #2737

Open
ctjlewis opened this issue Feb 2, 2025 · 0 comments
Open

feat(GRPOTrainer): reward_func return None to skip #2737

ctjlewis opened this issue Feb 2, 2025 · 0 comments
Labels
✨ enhancement New feature or request 🏋 GRPO Related to GRPO

Comments

@ctjlewis
Copy link

ctjlewis commented Feb 2, 2025

Feature request

Allow us to skip an item in the dataset by returning None from a reward_func.

Motivation

Currently, huggingface/open-r1 is returning reward: 1 for cases where we cannot process the answer or otherwise want to skip a case, see: huggingface/open-r1#159.

We likely don't want to provide the same reward signal as a correct answer just because there's an error parsing the gold answer, or an issue with the Math-Verify library for that case. We want a way to actually skip that case and give no reward nor penalty.

Your contribution

We should probably talk about the best way to do this and hear open-r1 authors' reasoning behind the original logic first. But yes I could PR it, though original authors probably faster.

@ctjlewis ctjlewis changed the title feat(GRPO): reward_func return None to skip feat(GRPOTrainer): reward_func return None to skip Feb 2, 2025
@github-actions github-actions bot added 🏋 GRPO Related to GRPO ✨ enhancement New feature or request labels Feb 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ enhancement New feature or request 🏋 GRPO Related to GRPO
Projects
None yet
Development

No branches or pull requests

1 participant