feat(GRPOTrainer): reward_func
return None
to skip
#83
Job | Run time |
---|---|
25s | |
25s |
reward_func
return None
to skip
#83
Job | Run time |
---|---|
25s | |
25s |