Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPO] Refactor eval logging of dpo trainer #954

Merged
merged 6 commits into from
Nov 30, 2023

Conversation

mnoukhov
Copy link
Contributor

@mnoukhov mnoukhov commented Nov 3, 2023

Address part 1 of #952 by removing store_metrics and switching to more standard compute_metrics letting Trainer handle logging

@lvwerra
Copy link
Member

lvwerra commented Nov 7, 2023

Thanks for working on this! Just ping me whenever you would like to run the CI.

@mnoukhov mnoukhov changed the title [DPO] WIP of refactor of dpo trainer [DPO] Refactor eval of dpo trainer Nov 9, 2023
@mnoukhov
Copy link
Contributor Author

mnoukhov commented Nov 9, 2023

@lvwerra, I completed this part of the refactor, can you run the CI?

@kashif as part of this, I removed logging all the logits since it seemed a bit excessive to pass around the batch x seq_len x vocab size array and taking the mean didn't feel like a meaningful thing to log since we already have logps. Let me know if you'd like this reversed

@mnoukhov mnoukhov changed the title [DPO] Refactor eval of dpo trainer [DPO] Refactor eval logging of dpo trainer Nov 9, 2023
@mnoukhov mnoukhov marked this pull request as ready for review November 9, 2023 04:51
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@kashif
Copy link
Collaborator

kashif commented Nov 9, 2023

@mnoukhov are you also logging sample outputs as well here?

@mnoukhov
Copy link
Contributor Author

mnoukhov commented Nov 9, 2023

@kashif I didn't change sample output logging in this PR, so it's still there. I was going to address the sample logging issue (2 in #952) in a second PR to keep things managable

@kashif
Copy link
Collaborator

kashif commented Nov 9, 2023

ok makes sense!

@kashif
Copy link
Collaborator

kashif commented Nov 29, 2023

so this PR LGTM! feed free to merge

@lvwerra lvwerra merged commit 6d9ea38 into huggingface:main Nov 30, 2023
9 checks passed
lvwerra added a commit that referenced this pull request Dec 1, 2023
@kashif
Copy link
Collaborator

kashif commented Dec 1, 2023

@mnoukhov so folks really want the additional metrics to be logged during training which isn't possible at the moment, so lets revert this for now and continue this refactoring to figure that out.

lvwerra added a commit that referenced this pull request Dec 1, 2023
@mnoukhov
Copy link
Contributor Author

mnoukhov commented Dec 1, 2023

I don't think this change prevents additional metrics from being logged. I can create a followup PR that addresses those comments

@kashif
Copy link
Collaborator

kashif commented Dec 1, 2023

Ok so let's try to add the previous behavior if possible... I couldn't figure it out at first glance... I think the trainer's training loop doesn't seem to have a way to do it... can you kindly confirm?

@mnoukhov
Copy link
Contributor Author

mnoukhov commented Dec 1, 2023

Basically what #1046 wants is to log metrics during training, which is not how Trainer usually works but it is a common ask e.g. https://discuss.huggingface.co/t/logging-training-accuracy-using-trainer-class/5524/3. The reverted code does not do this correctly.

AFAIK the problem with trying to put self.log into compute_loss is it doesn't really make sense whenever you use multi-gpu and it didn't account for different batch sizes. The code puts all metrics from a single gpu's minibatch into stored_metrics and then takes the mean at each self.log call. There needs to be an accelerator.gather_for_metrics somewhere in there in order to account for different batch sizes and multi-gpu. @lewtun, is this right?

If you want metrics for the training set, we can either add keep this stored_metrics and try to make it work on multi-gpu but it feels a bit ugly. The other options are to call evaluate on the training set any time it is also done on the test set, which has its own set of problems.

lapp0 pushed a commit to lapp0/trl that referenced this pull request May 10, 2024
* first attempts at refactor of dpo trainer

* removed extra stuff in prediction step

* import fixes

* label names

* all working

---------

Co-authored-by: Leandro von Werra <[email protected]>
lapp0 pushed a commit to lapp0/trl that referenced this pull request May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants