Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem converting DeBERTaV3 to ONNX using optimum-cli #2075

Open
2 of 4 tasks
marcovzla opened this issue Oct 21, 2024 · 0 comments
Open
2 of 4 tasks

Problem converting DeBERTaV3 to ONNX using optimum-cli #2075

marcovzla opened this issue Oct 21, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@marcovzla
Copy link

marcovzla commented Oct 21, 2024

System Info

$ pip freeze | grep optimum
optimum==1.23.1

$ python -V
Python 3.11.2

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 12 (bookworm)
Release:	12
Codename:	bookworm

Who can help?

@michaelbenayoun

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

$ optimum-cli export onnx \
    --framework pt \
    --model microsoft/deberta-v3-base \
    --task text-classification \
    output-dir
Some weights of DebertaV2ForSequenceClassification were not initialized from the model checkpoint at microsoft/deberta-v3-base and are newly initialized: ['classifier.bias', 'classifier.weight', 'pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/convert_slow_tokenizer.py:558: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text.
  warnings.warn(
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:547: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  torch.tensor(mid - 1).type_as(relative_pos),
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:551: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  torch.ceil(torch.log(abs_pos / mid) / torch.log(torch.tensor((max_position - 1) / mid)) * (mid - 1)) + mid
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:710: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  scale = torch.sqrt(torch.tensor(query_layer.size(-1), dtype=torch.float) * scale_factor)
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:710: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  scale = torch.sqrt(torch.tensor(query_layer.size(-1), dtype=torch.float) * scale_factor)
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:785: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  scale = torch.sqrt(torch.tensor(pos_key_layer.size(-1), dtype=torch.float) * scale_factor)
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:785: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  scale = torch.sqrt(torch.tensor(pos_key_layer.size(-1), dtype=torch.float) * scale_factor)
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:797: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  scale = torch.sqrt(torch.tensor(pos_query_layer.size(-1), dtype=torch.float) * scale_factor)
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:797: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  scale = torch.sqrt(torch.tensor(pos_query_layer.size(-1), dtype=torch.float) * scale_factor)
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:798: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if key_layer.size(-2) != query_layer.size(-2):
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:105: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  output = input.masked_fill(rmask, torch.tensor(torch.finfo(input.dtype).min))

Expected behavior

I am trying to export my own fine-tuned DeBERTaV3 model and get those TracerWarning. Here I am using microsoft/deberta-v3-base as an example, so please ignore the warning about some weights not being initialized. The real problem are those TracerWarning, because an ONNX model is actually being generated but it is incorrect. It seems to always predict the same label.

I also see similar TracerWarning with microsoft/deberta-v2-xlarge, and some (less) TracerWarning with microsoft/deberta-base, but I haven't checked if those converted models are incorrect too.

If optimum-cli export onnx doesn't support DeBERTaV3 then I would appreciate a pointer to how to do the conversion from code, or any other possible solution.

Thanks!

@marcovzla marcovzla added the bug Something isn't working label Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant