Problem converting DeBERTaV3 to ONNX using optimum-cli #2075

marcovzla · 2024-10-21T22:51:53Z

System Info

$ pip freeze | grep optimum
optimum==1.23.1

$ python -V
Python 3.11.2

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 12 (bookworm)
Release:	12
Codename:	bookworm

Who can help?

@michaelbenayoun

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

$ optimum-cli export onnx \
    --framework pt \
    --model microsoft/deberta-v3-base \
    --task text-classification \
    output-dir
Some weights of DebertaV2ForSequenceClassification were not initialized from the model checkpoint at microsoft/deberta-v3-base and are newly initialized: ['classifier.bias', 'classifier.weight', 'pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/convert_slow_tokenizer.py:558: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text.
  warnings.warn(
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:547: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  torch.tensor(mid - 1).type_as(relative_pos),
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:551: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  torch.ceil(torch.log(abs_pos / mid) / torch.log(torch.tensor((max_position - 1) / mid)) * (mid - 1)) + mid
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:710: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  scale = torch.sqrt(torch.tensor(query_layer.size(-1), dtype=torch.float) * scale_factor)
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:710: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  scale = torch.sqrt(torch.tensor(query_layer.size(-1), dtype=torch.float) * scale_factor)
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:785: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  scale = torch.sqrt(torch.tensor(pos_key_layer.size(-1), dtype=torch.float) * scale_factor)
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:785: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  scale = torch.sqrt(torch.tensor(pos_key_layer.size(-1), dtype=torch.float) * scale_factor)
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:797: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  scale = torch.sqrt(torch.tensor(pos_query_layer.size(-1), dtype=torch.float) * scale_factor)
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:797: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  scale = torch.sqrt(torch.tensor(pos_query_layer.size(-1), dtype=torch.float) * scale_factor)
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:798: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if key_layer.size(-2) != query_layer.size(-2):
/home/marcovalenzuelaescarcega/.virtualenvs/onnx/lib/python3.11/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:105: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  output = input.masked_fill(rmask, torch.tensor(torch.finfo(input.dtype).min))

Expected behavior

I am trying to export my own fine-tuned DeBERTaV3 model and get those TracerWarning. Here I am using microsoft/deberta-v3-base as an example, so please ignore the warning about some weights not being initialized. The real problem are those TracerWarning, because an ONNX model is actually being generated but it is incorrect. It seems to always predict the same label.

I also see similar TracerWarning with microsoft/deberta-v2-xlarge, and some (less) TracerWarning with microsoft/deberta-base, but I haven't checked if those converted models are incorrect too.

If optimum-cli export onnx doesn't support DeBERTaV3 then I would appreciate a pointer to how to do the conversion from code, or any other possible solution.

Thanks!

The text was updated successfully, but these errors were encountered:

marcovzla added the bug Something isn't working label Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem converting DeBERTaV3 to ONNX using optimum-cli #2075

Problem converting DeBERTaV3 to ONNX using optimum-cli #2075

marcovzla commented Oct 21, 2024 •

edited

Loading

Problem converting DeBERTaV3 to ONNX using optimum-cli #2075

Problem converting DeBERTaV3 to ONNX using optimum-cli #2075

Comments

marcovzla commented Oct 21, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior

marcovzla commented Oct 21, 2024 •

edited

Loading