The code in this repo was used to produce How continuous batching enables 23x throughput in LLM inference while reducing p50 latency.
The code in this repo was used to produce How continuous batching enables 23x throughput in LLM inference while reducing p50 latency.