You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The pipeline is used to generate synthetic data which is used to fine tune the reward model and the proactive agent.
The difference between the pipeline and demo is that, the pipeline creates a virtual environment where agent interact with a fake user, the data generated will be used to finetune the model, while our demo provides a real interaction between a prompted LLM and you.
You may fill in the private.toml as your finetuned model (provided it supports OpenAI SDK, though we didn't mention that in the readme), but simply using origin models is also OK.
我是一个大模型新手,我想问个简单的问题:
我看论文,gym,proactive agent和user agent这三者构成了一个流水线,请问这个所谓的流水线是用来微调大模型的吗,然后ragent作为demo,其private.toml里需要填写成上述微调后的大模型地址是吗?
The text was updated successfully, but these errors were encountered: