[Feature Request] Model Version Compatible with NVIDIA A100 (Ampere Architecture) or Alternative Deployment Methods #359

XDeviation · 2025-02-10T11:19:45Z

I am trying to use the non-quantized version of your open-source large model, which requires the fp8e4nv data type. However, I noticed that fp8e4nv is not supported on NVIDIA Ampere architecture GPUs (e.g., A100) due to hardware-level limitations. This makes it impossible to run the model on A100 GPUs.
Since A100 is a widely used GPU in many research and production environments, it would be highly beneficial if you could provide a version of the model that is compatible with Ampere architecture GPUs. Alternatively, if there are other methods to deploy the model on A100-based systems, please consider documenting or providing support for those methods.

Detail

Model: https://huggingface.co/deepseek-ai/DeepSeek-R1
CUDA Version : 12.2
GPU Configuration : 2x8xA100-SXM4-80GB (two machines connected via vLLM framework with InfiniBand NICs for communication)

Error Encountered:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 35, in wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 993, in to
[rank0]:     return semantic.cast(self, dtype, _builder, fp_downcast_rounding)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/triton/language/semantic.py", line 759, in cast
[rank0]:     assert builder.options.allow_fp8e4nv, "fp8e4nv data type is not supported on CUDA arch < 89"
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: AssertionError: fp8e4nv data type is not supported on CUDA arch < 89

The text was updated successfully, but these errors were encountered:

XDeviation · 2025-02-10T11:31:43Z

By the way, for the same reason (lack of support for fp8e4nv), I am unable to convert the model into the GGUF format using tools like llama.cpp. This further limits the usability of the model in environments where GGUF is commonly used.

INFO:hf-to-gguf:Loading model: DeepSeek-R1
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-000163.safetensors'
INFO:hf-to-gguf:token_embd.weight,            torch.bfloat16 --> F16, shape = {7168, 129280}
INFO:hf-to-gguf:blk.0.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.0.ffn_down.weight,        torch.float8_e4m3fn --> F16, shape = {18432, 7168}
Traceback (most recent call last):
  File "/mnt/disk3/llama.cpp/convert_hf_to_gguf.py", line 5112, in <module>
    main()
  File "/mnt/disk3/llama.cpp/convert_hf_to_gguf.py", line 5106, in main
    model_instance.write()
  File "/mnt/disk3/llama.cpp/convert_hf_to_gguf.py", line 439, in write
    self.prepare_tensors()
  File "/mnt/disk3/llama.cpp/convert_hf_to_gguf.py", line 4142, in prepare_tensors
    super().prepare_tensors()
  File "/mnt/disk3/llama.cpp/convert_hf_to_gguf.py", line 298, in prepare_tensors
    for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
  File "/mnt/disk3/llama.cpp/convert_hf_to_gguf.py", line 4139, in modify_tensors
    return [(self.map_tensor_name(name), data_torch)]
  File "/mnt/disk3/llama.cpp/convert_hf_to_gguf.py", line 214, in map_tensor_name
    raise ValueError(f"Can not map tensor {name!r}")
ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.weight_scale_inv'

haowang-cqu · 2025-02-10T11:42:14Z

Hello,

arcee-ai/DeepSeek-R1-bf16 is a BF16 conversion of the original FP8 weights.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Model Version Compatible with NVIDIA A100 (Ampere Architecture) or Alternative Deployment Methods #359

[Feature Request] Model Version Compatible with NVIDIA A100 (Ampere Architecture) or Alternative Deployment Methods #359

XDeviation commented Feb 10, 2025

XDeviation commented Feb 10, 2025

haowang-cqu commented Feb 10, 2025

[Feature Request] Model Version Compatible with NVIDIA A100 (Ampere Architecture) or Alternative Deployment Methods #359

[Feature Request] Model Version Compatible with NVIDIA A100 (Ampere Architecture) or Alternative Deployment Methods #359

Comments

XDeviation commented Feb 10, 2025

Detail

Error Encountered:

XDeviation commented Feb 10, 2025

haowang-cqu commented Feb 10, 2025