[Bug] The batch decoding speed of DeepSeek V3 is too slow. #3102
Unanswered
SonChoulJun
asked this question in
Q&A
Replies: 2 comments 2 replies
-
Does the GPU environment have InfiniBand? |
Beta Was this translation helpful? Give feedback.
2 replies
-
Why add |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Checklist
Describe the bug
The hardware setup consists of H100 * 8 across 2 nodes.
The execution command is as follows:
python3 -m sglang.launch_server --model-path /workspace/DBP/sllm_mount/models/DeepSeek-V3 --tp 16 --dist-init-addr 192.168.1.10:20000 --nnodes 2 --node-rank 0 --trust-remote-code --host 0.0.0.0 --port 8010 --disable-cuda-graph
Reproduction
deepseek-V3
Environment
The hardware setup consists of H100 * 8 across 2 nodes.
Beta Was this translation helpful? Give feedback.
All reactions