You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to use the non-quantized version of your open-source large model, which requires the fp8e4nv data type. However, I noticed that fp8e4nv is not supported on NVIDIA Ampere architecture GPUs (e.g., A100) due to hardware-level limitations. This makes it impossible to run the model on A100 GPUs.
Since A100 is a widely used GPU in many research and production environments, it would be highly beneficial if you could provide a version of the model that is compatible with Ampere architecture GPUs. Alternatively, if there are other methods to deploy the model on A100-based systems, please consider documenting or providing support for those methods.
GPU Configuration : 2x8xA100-SXM4-80GB (two machines connected via vLLM framework with InfiniBand NICs for communication)
Error Encountered:
[rank0]: Traceback (most recent call last):
[rank0]: File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 35, in wrapper
[rank0]: return fn(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 993, in to
[rank0]: return semantic.cast(self, dtype, _builder, fp_downcast_rounding)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/triton/language/semantic.py", line 759, in cast
[rank0]: assert builder.options.allow_fp8e4nv, "fp8e4nv data type is not supported on CUDA arch < 89"
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: AssertionError: fp8e4nv data type is not supported on CUDA arch < 89
The text was updated successfully, but these errors were encountered:
By the way, for the same reason (lack of support for fp8e4nv), I am unable to convert the model into the GGUF format using tools like llama.cpp. This further limits the usability of the model in environments where GGUF is commonly used.
INFO:hf-to-gguf:Loading model: DeepSeek-R1
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-000163.safetensors'
INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {7168, 129280}
INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.float8_e4m3fn --> F16, shape = {18432, 7168}
Traceback (most recent call last):
File "/mnt/disk3/llama.cpp/convert_hf_to_gguf.py", line 5112, in <module>
main()
File "/mnt/disk3/llama.cpp/convert_hf_to_gguf.py", line 5106, in main
model_instance.write()
File "/mnt/disk3/llama.cpp/convert_hf_to_gguf.py", line 439, in write
self.prepare_tensors()
File "/mnt/disk3/llama.cpp/convert_hf_to_gguf.py", line 4142, in prepare_tensors
super().prepare_tensors()
File "/mnt/disk3/llama.cpp/convert_hf_to_gguf.py", line 298, in prepare_tensors
for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
File "/mnt/disk3/llama.cpp/convert_hf_to_gguf.py", line 4139, in modify_tensors
return [(self.map_tensor_name(name), data_torch)]
File "/mnt/disk3/llama.cpp/convert_hf_to_gguf.py", line 214, in map_tensor_name
raise ValueError(f"Can not map tensor {name!r}")
ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.weight_scale_inv'
I am trying to use the non-quantized version of your open-source large model, which requires the fp8e4nv data type. However, I noticed that fp8e4nv is not supported on NVIDIA Ampere architecture GPUs (e.g., A100) due to hardware-level limitations. This makes it impossible to run the model on A100 GPUs.
Since A100 is a widely used GPU in many research and production environments, it would be highly beneficial if you could provide a version of the model that is compatible with Ampere architecture GPUs. Alternatively, if there are other methods to deploy the model on A100-based systems, please consider documenting or providing support for those methods.
Detail
Error Encountered:
The text was updated successfully, but these errors were encountered: