Skip to content

Issues: huggingface/trl

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

Multi-node training with deepspeed launcher 🚀 deepspeed Related to deepspeed 🏋 SFT Related to SFT
#2605 opened Jan 22, 2025 by ghtaro
5 tasks done
[Question] DataCollatorForCompletionOnlyLM with dynamic padding? ❓ question Seeking clarification or more information
#2603 opened Jan 22, 2025 by katzurik
[Question] Log eval metrics performed during training to files 📚 documentation Improvements or additions to documentation ❓ question Seeking clarification or more information
#2602 opened Jan 22, 2025 by skandermoalla
Add the training method for DeepSeek-R1 ✨ enhancement New feature or request
#2599 opened Jan 21, 2025 by MohamedAliRashad
Potential bug in PPO Trainer 🐛 bug Something isn't working 🏋 PPO Related to PPO
#2596 opened Jan 21, 2025 by kyleliang919
5 tasks done
PRM Performance on Different Data Type 🏋 PRM Related to PRM ❓ question Seeking clarification or more information
#2591 opened Jan 19, 2025 by TanZhendong
wandb step slider implementation in example notebook ❓ question Seeking clarification or more information
#2589 opened Jan 18, 2025 by stellaludai
GKD trainer doesn't work too well with the llama series 🐛 bug Something isn't working 🏋 GKD Related to GKD
#2586 opened Jan 17, 2025 by Omar-Deepshard
5 tasks done
Make PPOTrainer compatible with PRMs ✨ enhancement New feature or request 🏋 PPO Related to PPO
#2577 opened Jan 16, 2025 by kyleliang919
ORPO on SFT dataset 🏋 ORPO Related to ORPO ❓ question Seeking clarification or more information
#2570 opened Jan 15, 2025 by vitalyshalumov
7 of 9 tasks
RuntimeError: Function 'Log1PBackward0' returned nan values in its 0th output. 🐛 bug Something isn't working 🏋 ORPO Related to ORPO
#2564 opened Jan 13, 2025 by zhaoxjmail
7 of 9 tasks
dpo_vlm.py 🐛 bug Something isn't working 🏋 DPO Related to DPO 👁️ VLM Related to Visual Language Models
#2563 opened Jan 12, 2025 by liuchaohu
5 of 9 tasks
Problem with accelerate>=1.0.0 when running official PPO/RLOO examples ⚡accelerate Related to accelerate 🏋 PPO Related to PPO 🏋 RLOO Related to RLOO
#2555 opened Jan 10, 2025 by dawidm
7 of 9 tasks
Finetuning on the last turn of multi-turn conversations ❓ question Seeking clarification or more information 🏋 SFT Related to SFT
#2545 opened Jan 6, 2025 by okhat
Different finetune speed in DPO task of peft and ms-swift (600/S iter vs 30/s iter) 🏋 DPO Related to DPO 🙋 help from community wanted Open invitation for community members to contribute ⚡ PEFT Related to PEFT
#2536 opened Jan 2, 2025 by maoulee
7 of 9 tasks
(Willing to PR) Will it be welcomed if speeding up algorithms like PPO and code refactor/cleanup? 🏋 PPO Related to PPO ❓ question Seeking clarification or more information 🏋 RLOO Related to RLOO
#2535 opened Dec 31, 2024 by fzyzcjy
onlinedpo error when use deepspeed zero3 🐛 bug Something isn't working 🚀 deepspeed Related to deepspeed ⏳ needs more info Additional information or clarification is required to proceed 🏋 Online DPO Related to Online DPO
#2532 opened Dec 30, 2024 by yiyepiaoling0715
5 of 9 tasks
PPOTrainer: num_mini_batches setting affects training progress bar in an unexpected way 🐛 bug Something isn't working 🏋 PPO Related to PPO
#2530 opened Dec 29, 2024 by dawidm
6 of 9 tasks
ProTip! no:milestone will show everything without a milestone.