Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simple question #18

Open
honglf opened this issue Dec 20, 2024 · 1 comment
Open

simple question #18

honglf opened this issue Dec 20, 2024 · 1 comment

Comments

@honglf
Copy link

honglf commented Dec 20, 2024

我是一个大模型新手,我想问个简单的问题:
我看论文,gym,proactive agent和user agent这三者构成了一个流水线,请问这个所谓的流水线是用来微调大模型的吗,然后ragent作为demo,其private.toml里需要填写成上述微调后的大模型地址是吗?

@SummerFall1819
Copy link
Collaborator

The pipeline is used to generate synthetic data which is used to fine tune the reward model and the proactive agent.
The difference between the pipeline and demo is that, the pipeline creates a virtual environment where agent interact with a fake user, the data generated will be used to finetune the model, while our demo provides a real interaction between a prompted LLM and you.
You may fill in the private.toml as your finetuned model (provided it supports OpenAI SDK, though we didn't mention that in the readme), but simply using origin models is also OK.

这个流水线就是用来生成合成数据,这些数据会被用来微调论文中提到的奖励模型和 Proactive 模型。
流水线和 demo 的区别是,流水线提供了一个虚拟的环境,其中 agent 和虚拟的用户进行交互,产生的合成数据会被用来微调模型,然而我们的 demo 提供了一个真实的,提示过的大模型和你进行交互。
我们的 demo 允许你填入自己微调后的大模型(只要支持 OpenAI 接口,虽然我们没有在 readme 里面说明),或者简单的调用原本的模型也是可以的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants