Skip to content

[HOWTO] how to set --num-prompts when benchmarking #816

Closed Answered by Ying1123
lxww302 asked this question in Q&A
Discussion options

You must be logged in to vote

If the request rate exceeds server's capacity, the request will be queue up. Then larger number of prompts cause higher E2E latency.

Replies: 2 comments 6 replies

Comment options

You must be logged in to vote
6 replies
@hnyls2002
Comment options

@hnyls2002
Comment options

@lxww302
Comment options

@hnyls2002
Comment options

@lxww302
Comment options

Comment options

You must be logged in to vote
0 replies
Answer selected by merrymercy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants
Converted from issue

This discussion was converted from issue #808 on July 30, 2024 03:19.