llm-continuous-batching-benchmarks/README.md at master · anyscale/llm-continuous-batching-benchmarks · GitHub

The code in this repo was used to produce How continuous batching enables 23x throughput in LLM inference while reducing p50 latency.