You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We likely don't want to provide the same reward signal as a correct answer just because there's an error parsing the gold answer, or an issue with the Math-Verify library for that case. We want a way to actually skip that case and give no reward nor penalty.
Your contribution
We should probably talk about the best way to do this and hear open-r1 authors' reasoning behind the original logic first. But yes I could PR it, though original authors probably faster.
The text was updated successfully, but these errors were encountered:
ctjlewis
changed the title
feat(GRPO): reward_func return None to skip
feat(GRPOTrainer): reward_func return None to skip
Feb 2, 2025
Feature request
Allow us to skip an item in the dataset by returning
None
from areward_func
.Motivation
Currently, huggingface/open-r1 is returning reward: 1 for cases where we cannot process the answer or otherwise want to skip a case, see: huggingface/open-r1#159.
We likely don't want to provide the same reward signal as a correct answer just because there's an error parsing the gold answer, or an issue with the Math-Verify library for that case. We want a way to actually skip that case and give no reward nor penalty.
Your contribution
We should probably talk about the best way to do this and hear open-r1 authors' reasoning behind the original logic first. But yes I could PR it, though original authors probably faster.
The text was updated successfully, but these errors were encountered: