feat(GRPOTrainer): `reward_func` return `None` to skip #2737

ctjlewis · 2025-02-02T08:25:50Z

Feature request

Allow us to skip an item in the dataset by returning None from a reward_func.

Motivation

Currently, huggingface/open-r1 is returning reward: 1 for cases where we cannot process the answer or otherwise want to skip a case, see: huggingface/open-r1#159.

We likely don't want to provide the same reward signal as a correct answer just because there's an error parsing the gold answer, or an issue with the Math-Verify library for that case. We want a way to actually skip that case and give no reward nor penalty.

Your contribution

We should probably talk about the best way to do this and hear open-r1 authors' reasoning behind the original logic first. But yes I could PR it, though original authors probably faster.

The text was updated successfully, but these errors were encountered:

ctjlewis changed the title ~~feat(GRPO): reward_func return None to skip~~ feat(GRPOTrainer): reward_func return None to skip Feb 2, 2025

github-actions bot added 🏋 GRPO Related to GRPO ✨ enhancement New feature or request labels Feb 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(GRPOTrainer): `reward_func` return `None` to skip #2737

feat(GRPOTrainer): `reward_func` return `None` to skip #2737

ctjlewis commented Feb 2, 2025 •

edited

Loading

feat(GRPOTrainer): reward_func return None to skip #2737

feat(GRPOTrainer): reward_func return None to skip #2737

Comments

ctjlewis commented Feb 2, 2025 • edited Loading

Feature request

Motivation

Your contribution

feat(GRPOTrainer): `reward_func` return `None` to skip #2737

feat(GRPOTrainer): `reward_func` return `None` to skip #2737

ctjlewis commented Feb 2, 2025 •

edited

Loading