Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Model Version Compatible with NVIDIA A100 (Ampere Architecture) or Alternative Deployment Methods #359

Open
XDeviation opened this issue Feb 10, 2025 · 2 comments

Comments

@XDeviation
Copy link

I am trying to use the non-quantized version of your open-source large model, which requires the fp8e4nv data type. However, I noticed that fp8e4nv is not supported on NVIDIA Ampere architecture GPUs (e.g., A100) due to hardware-level limitations. This makes it impossible to run the model on A100 GPUs.
Since A100 is a widely used GPU in many research and production environments, it would be highly beneficial if you could provide a version of the model that is compatible with Ampere architecture GPUs. Alternatively, if there are other methods to deploy the model on A100-based systems, please consider documenting or providing support for those methods.

Detail

Error Encountered:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 35, in wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 993, in to
[rank0]:     return semantic.cast(self, dtype, _builder, fp_downcast_rounding)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/triton/language/semantic.py", line 759, in cast
[rank0]:     assert builder.options.allow_fp8e4nv, "fp8e4nv data type is not supported on CUDA arch < 89"
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: AssertionError: fp8e4nv data type is not supported on CUDA arch < 89
@XDeviation
Copy link
Author

By the way, for the same reason (lack of support for fp8e4nv), I am unable to convert the model into the GGUF format using tools like llama.cpp. This further limits the usability of the model in environments where GGUF is commonly used.

INFO:hf-to-gguf:Loading model: DeepSeek-R1
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-000163.safetensors'
INFO:hf-to-gguf:token_embd.weight,            torch.bfloat16 --> F16, shape = {7168, 129280}
INFO:hf-to-gguf:blk.0.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.0.ffn_down.weight,        torch.float8_e4m3fn --> F16, shape = {18432, 7168}
Traceback (most recent call last):
  File "/mnt/disk3/llama.cpp/convert_hf_to_gguf.py", line 5112, in <module>
    main()
  File "/mnt/disk3/llama.cpp/convert_hf_to_gguf.py", line 5106, in main
    model_instance.write()
  File "/mnt/disk3/llama.cpp/convert_hf_to_gguf.py", line 439, in write
    self.prepare_tensors()
  File "/mnt/disk3/llama.cpp/convert_hf_to_gguf.py", line 4142, in prepare_tensors
    super().prepare_tensors()
  File "/mnt/disk3/llama.cpp/convert_hf_to_gguf.py", line 298, in prepare_tensors
    for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
  File "/mnt/disk3/llama.cpp/convert_hf_to_gguf.py", line 4139, in modify_tensors
    return [(self.map_tensor_name(name), data_torch)]
  File "/mnt/disk3/llama.cpp/convert_hf_to_gguf.py", line 214, in map_tensor_name
    raise ValueError(f"Can not map tensor {name!r}")
ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.weight_scale_inv'

@haowang-cqu
Copy link

Hello,

arcee-ai/DeepSeek-R1-bf16 is a BF16 conversion of the original FP8 weights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants