From 724acb9716d5124c83993a3d92b6e6bb8bb73668 Mon Sep 17 00:00:00 2001 From: Sergio Paniego Blanco Date: Thu, 6 Feb 2025 18:28:05 +0100 Subject: [PATCH] =?UTF-8?q?=F0=9F=92=A1=20Add=20'Post=20training=20an=20LL?= =?UTF-8?q?M=20for=20reasoning=20with=20GRPO=20in=20TRL'=20tutorial=20(#27?= =?UTF-8?q?85)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/source/community_tutorials.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/source/community_tutorials.md b/docs/source/community_tutorials.md index b23f60ccff..0cd5e8e25f 100644 --- a/docs/source/community_tutorials.md +++ b/docs/source/community_tutorials.md @@ -6,6 +6,7 @@ Community tutorials are made by active members of the Hugging Face community who | Task | Class | Description | Author | Tutorial | Colab | | --- | --- | --- | --- | --- | --- | +| Reinforcement Learning | [`GRPOTrainer`] | Post training an LLM for reasoning with GRPO in TRL | [Sergio Paniego](https://huggingface.co/sergiopaniego) | [Link](https://huggingface.co/learn/cookbook/fine_tuning_llm_grpo_trl) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/fine_tuning_llm_grpo_trl.ipynb) | | Reinforcement Learning | [`GRPOTrainer`] | Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial | [Philipp Schmid](https://huggingface.co/philschmid) | [Link](https://www.philschmid.de/mini-deepseek-r1) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/philschmid/deep-learning-pytorch-huggingface/blob/main/training/mini-deepseek-r1-aha-grpo.ipynb) | | Instruction tuning | [`SFTTrainer`] | Fine-tuning Google Gemma LLMs using ChatML format with QLoRA | [Philipp Schmid](https://huggingface.co/philschmid) | [Link](https://www.philschmid.de/fine-tune-google-gemma) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/philschmid/deep-learning-pytorch-huggingface/blob/main/training/gemma-lora-example.ipynb) | | Structured Generation | [`SFTTrainer`] | Fine-tuning Llama-2-7B to generate Persian product catalogs in JSON using QLoRA and PEFT | [Mohammadreza Esmaeilian](https://huggingface.co/Mohammadreza) | [Link](https://huggingface.co/learn/cookbook/en/fine_tuning_llm_to_generate_persian_product_catalogs_in_json_format) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/fine_tuning_llm_to_generate_persian_product_catalogs_in_json_format.ipynb) |