simple question #18

honglf · 2024-12-20T09:50:58Z

我是一个大模型新手，我想问个简单的问题：
我看论文，gym，proactive agent和user agent这三者构成了一个流水线，请问这个所谓的流水线是用来微调大模型的吗，然后ragent作为demo，其private.toml里需要填写成上述微调后的大模型地址是吗？

SummerFall1819 · 2024-12-20T12:04:03Z

The pipeline is used to generate synthetic data which is used to fine tune the reward model and the proactive agent.
The difference between the pipeline and demo is that, the pipeline creates a virtual environment where agent interact with a fake user, the data generated will be used to finetune the model, while our demo provides a real interaction between a prompted LLM and you.
You may fill in the private.toml as your finetuned model (provided it supports OpenAI SDK, though we didn't mention that in the readme), but simply using origin models is also OK.

这个流水线就是用来生成合成数据，这些数据会被用来微调论文中提到的奖励模型和 Proactive 模型。
流水线和 demo 的区别是，流水线提供了一个虚拟的环境，其中 agent 和虚拟的用户进行交互，产生的合成数据会被用来微调模型，然而我们的 demo 提供了一个真实的，提示过的大模型和你进行交互。
我们的 demo 允许你填入自己微调后的大模型(只要支持 OpenAI 接口，虽然我们没有在 readme 里面说明)，或者简单的调用原本的模型也是可以的。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simple question #18

simple question #18

honglf commented Dec 20, 2024

SummerFall1819 commented Dec 20, 2024

simple question #18

simple question #18

Comments

honglf commented Dec 20, 2024

SummerFall1819 commented Dec 20, 2024