Skip to content
This repository was archived by the owner on May 28, 2024. It is now read-only.
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: ray-project/ray-llm
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.3.1
Choose a base ref
...
head repository: ray-project/ray-llm
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref

Commits on Oct 4, 2023

  1. Set all route_prefixes to /

    Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
    shrekris-anyscale committed Oct 4, 2023
    Copy the full SHA
    ce099fc View commit details
  2. Use name 'ray-llm' for all Serve apps

    Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
    shrekris-anyscale committed Oct 4, 2023
    Copy the full SHA
    74745d8 View commit details
  3. Merge pull request #69 from ray-project/remove_route_prefixes_configs

    Set `route_prefixes` in Serve configs to `/`
    richardliaw authored Oct 4, 2023
    Copy the full SHA
    e5c11cb View commit details

Commits on Oct 6, 2023

  1. Update call to get controller

    Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
    shrekris-anyscale committed Oct 6, 2023
    Copy the full SHA
    d70c9b8 View commit details

Commits on Oct 10, 2023

  1. Merge pull request #71 from ray-project/fix_serve_client_call

    Update method of accessing Serve controller
    shrekris-anyscale authored Oct 10, 2023
    Copy the full SHA
    9f681c9 View commit details

Commits on Oct 21, 2023

  1. Update README.md with link to kuberay instructions

    Signed-off-by: akshay-anyscale <122416226+akshay-anyscale@users.noreply.github.com>
    akshay-anyscale authored Oct 21, 2023
    Copy the full SHA
    121fcf7 View commit details
  2. Copy the full SHA
    b3560aa View commit details

Commits on Oct 24, 2023

  1. 0.4.0 release

    Signed-off-by: Avnish Narayan <avnish@anyscale.com>
    avnishn committed Oct 24, 2023
    Copy the full SHA
    d35809b View commit details

Commits on Oct 26, 2023

  1. 0.4.0 release

    Signed-off-by: Avnish Narayan <avnish@anyscale.com>
    avnishn authored and avnish narayan committed Oct 26, 2023
    Copy the full SHA
    d3569e8 View commit details
  2. Bump version number

    Signed-off-by: avnish narayan <avnish@avnish.local.meter>
    avnish narayan committed Oct 26, 2023
    Copy the full SHA
    4b56385 View commit details
  3. Copy the full SHA
    96d7fb1 View commit details
  4. Additional commits for updating readme, kubernetes docs, add falcon 7…

    …b, and update vllm_compatibility
    
    Signed-off-by: Avnish Narayan <avnish@anyscale.com>
    avnishn committed Oct 26, 2023
    Copy the full SHA
    83a54a1 View commit details

Commits on Oct 28, 2023

  1. Merge pull request #79 from avnishn/0.4.0

    0.4.0 release
    
    The following changes are introduced:
    
    Renaming aviary to rayllm.
    Support for reading models from gcs in addition to aws s3.
    Increased testing for prompting.
    New model configs for Falcon 7B and 40B.
    Make frontend compatible with Ray Serve 2.7
    
    
    Co-authored-by: Avnish Narayan <avnish@anyscale.com>
    Co-authored-by: Chris Sivanich <csivanich@anyscale.com>
    Co-authored-by: Tanmay Chordia <tchordia@gmail.com>
    Co-authored-by: Sihan Wang <sihanwang41@gmail.com>
    Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
    Co-authored-by: Richard Liaw <rliaw@anyscale.com>
    7 people authored Oct 28, 2023
    Copy the full SHA
    c2a22af View commit details
  2. add awq quantized model

    Yiqing Wang committed Oct 28, 2023
    Copy the full SHA
    eecf941 View commit details

Commits on Nov 1, 2023

  1. Doc/Config update for rayllm

    Signed-off-by: Sihan Wang <sihanwang41@gmail.com>
    sihanwang41 committed Nov 1, 2023
    Copy the full SHA
    ab69d9f View commit details
  2. Add more

    Signed-off-by: Sihan Wang <sihanwang41@gmail.com>
    sihanwang41 committed Nov 1, 2023
    Copy the full SHA
    81aad72 View commit details
  3. Merge pull request #85 from sihanwang41/doc_cherrypick

    Doc/Config update for rayllm
    sihanwang41 authored Nov 1, 2023
    Copy the full SHA
    02ca73c View commit details

Commits on Nov 2, 2023

  1. update doc

    Yiqing Wang committed Nov 2, 2023
    Copy the full SHA
    7fc0d82 View commit details
  2. merge from master

    Yiqing Wang committed Nov 2, 2023
    Copy the full SHA
    0d1e001 View commit details
  3. address comments

    Yiqing Wang committed Nov 2, 2023
    Copy the full SHA
    efb89cf View commit details
  4. add serve config

    Yiqing Wang committed Nov 2, 2023
    Copy the full SHA
    be402f0 View commit details
  5. Rename ray-llm docker image name in doc

    Signed-off-by: Sihan Wang <sihanwang41@gmail.com>
    sihanwang41 committed Nov 2, 2023
    Copy the full SHA
    fef5eea View commit details
  6. Merge pull request #87 from sihanwang41/rename_docker

    Rename ray-llm docker image name in doc
    sihanwang41 authored Nov 2, 2023
    Copy the full SHA
    8fd2dc9 View commit details

Commits on Nov 8, 2023

  1. fix: no attribute 'set_url'

    Mike Arov committed Nov 8, 2023
    Copy the full SHA
    ea61d98 View commit details
  2. revert readme

    Yiqing Wang committed Nov 8, 2023
    Copy the full SHA
    98bad43 View commit details

Commits on Nov 9, 2023

  1. fix: frontend ignores MONGODB_URL

    Mike Arov committed Nov 9, 2023
    Copy the full SHA
    3eb6892 View commit details

Commits on Nov 13, 2023

  1. Merge pull request #82 from YQ-Wang/awq-model

    Add AWQ Quantized Llama 2 70B Model Config & Update README
    shrekris-anyscale authored Nov 13, 2023
    Copy the full SHA
    ae910a2 View commit details

Commits on Nov 15, 2023

  1. Merge pull request #92 from marov/master

    fix: no attribute 'set_url'
    sihanwang41 authored Nov 15, 2023
    Copy the full SHA
    335e688 View commit details

Commits on Nov 16, 2023

  1. add awq and squeezellm configs

    uvikas committed Nov 16, 2023
    Copy the full SHA
    793ebce View commit details

Commits on Nov 20, 2023

  1. add llmperf benchmarks

    uvikas committed Nov 20, 2023
    Copy the full SHA
    0557452 View commit details
  2. rename

    uvikas committed Nov 20, 2023
    Copy the full SHA
    436f478 View commit details
  3. link quantization guide

    uvikas committed Nov 20, 2023
    Copy the full SHA
    2aa47f3 View commit details
  4. Merge pull request #95 from ray-project/quantization

    Add AWQ and SqueezeLLM quantization configs
    uvikas authored Nov 20, 2023
    Copy the full SHA
    fa3a766 View commit details

Commits on Dec 6, 2023

  1. Move telemtry line

    Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
    shrekris-anyscale committed Dec 6, 2023
    Copy the full SHA
    d2518d3 View commit details
  2. Merge pull request #102 from ray-project/shrekris/track_usage_serve_c…

    …onfigs
    
    Record telemetry when RayLLM is launched using a Serve config
    shrekris-anyscale authored Dec 6, 2023
    Copy the full SHA
    78e076e View commit details

Commits on Jan 8, 2024

  1. test

    Signed-off-by: Sihan Wang <sihanwang41@gmail.com>
    sihanwang41 committed Jan 8, 2024
    Copy the full SHA
    842537b View commit details
  2. Update

    Signed-off-by: Sihan Wang <sihanwang41@gmail.com>
    sihanwang41 committed Jan 8, 2024
    Copy the full SHA
    bb472cd View commit details
  3. Update

    Signed-off-by: Sihan Wang <sihanwang41@gmail.com>
    sihanwang41 committed Jan 8, 2024
    Copy the full SHA
    c3aaacc View commit details
  4. Update

    Signed-off-by: Sihan Wang <sihanwang41@gmail.com>
    sihanwang41 committed Jan 8, 2024
    Copy the full SHA
    90274cc View commit details
  5. Update

    Signed-off-by: Sihan Wang <sihanwang41@gmail.com>
    sihanwang41 committed Jan 8, 2024
    Copy the full SHA
    f2305fd View commit details
  6. Merge pull request #115 from sihanwang41/test

    Update doc build ci job
    sihanwang41 authored Jan 8, 2024
    Copy the full SHA
    ca95107 View commit details

Commits on Jan 18, 2024

  1. Release 0.5.0 (#111)

    Signed-off-by: Sihan Wang <sihanwang41@gmail.com>
    Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>
    sihanwang41 and shrekris-anyscale authored Jan 18, 2024
    Copy the full SHA
    cbf88c2 View commit details
  2. Update vllm version (#119)

    Signed-off-by: Sihan Wang <sihanwang41@gmail.com>
    sihanwang41 authored Jan 18, 2024
    Copy the full SHA
    5255abe View commit details

Commits on Jan 25, 2024

  1. Add more details about prompt format in the docs (#126)

    Trying to make it easier for users to self-service add custom models to
    use with ray-llm.
    
    ---------
    
    Signed-off-by: Alan Guo <aguo@anyscale.com>
    alanwguo authored Jan 25, 2024
    Copy the full SHA
    f6926b7 View commit details

Commits on Jan 26, 2024

  1. Add examples to the prompt format docs (#128)

    Signed-off-by: Alan Guo <aguo@anyscale.com>
    alanwguo authored Jan 26, 2024
    Copy the full SHA
    6b86748 View commit details

Commits on Mar 26, 2024

  1. Copy the full SHA
    a36a602 View commit details

Commits on Mar 27, 2024

  1. follow up on removing explorer from readme (#146)

    I noted a minor issue when verifying the changes from
    #145
    ArturNiederfahrenhorst authored Mar 27, 2024
    Copy the full SHA
    9251fb5 View commit details

Commits on May 28, 2024

  1. Update README.md for rayllm

    Signed-off-by: akshay-anyscale <122416226+akshay-anyscale@users.noreply.github.com>
    akshay-anyscale authored May 28, 2024
    Copy the full SHA
    400df51 View commit details
  2. Update README.md

    Signed-off-by: akshay-anyscale <122416226+akshay-anyscale@users.noreply.github.com>
    akshay-anyscale authored May 28, 2024
    Copy the full SHA
    6f83cca View commit details
  3. Update README.md for RayLLM archival (#152)

    Signed-off-by: akshay-anyscale <122416226+akshay-anyscale@users.noreply.github.com>
    akshay-anyscale authored May 28, 2024
    Copy the full SHA
    d8c7b81 View commit details
Showing with 5,359 additions and 1,931 deletions.
  1. +1 −4 .github/workflows/docs.yaml
  2. +26 −0 .github/workflows/docs_build.yaml
  3. +5 −0 .gitignore
  4. +1 −1 .pre-commit-config.yaml
  5. +10 −3 Dockerfile
  6. +2 −2 MANIFEST.in
  7. +24 −43 README.md
  8. +0 −6 aviary/__init__.py
  9. +0 −4 aviary/backend/__init__.py
  10. +0 −14 aviary/backend/llm/dict_utils.py
  11. +0 −414 aviary/backend/llm/utils.py
  12. +0 −210 aviary/backend/llm/vllm/vllm_models.py
  13. +0 −3 aviary/backend/observability/tracing/__init__.py
  14. +0 −10 aviary/backend/server/openai_compat/openai_exception.py
  15. +0 −30 aviary/backend/server/openai_compat/openai_middleware.py
  16. +0 −75 aviary/backend/server/plugins/deployment_base_client.py
  17. +0 −12 aviary/backend/server/routers/middleware.py
  18. +0 −383 aviary/backend/server/routers/router_app.py
  19. +0 −136 aviary/backend/server/run.py
  20. +0 −3 aviary/env_conf.py
  21. 0 build_aviary_wheel.sh → build_rayllm_wheel.sh
  22. +0 −2 deploy.sh
  23. +2 −2 deploy/ray/backend.yaml
  24. +5 −5 deploy/ray/{aviary-cluster.yaml → rayllm-cluster.yaml}
  25. +51 −0 docs/DOCKERHUB.md
  26. +10 −10 docs/kuberay/deploy-on-eks.md
  27. +9 −9 docs/kuberay/deploy-on-gke.md
  28. +3 −3 docs/kuberay/{ray-cluster.aviary-eks.yaml → ray-cluster.rayllm-eks.yaml}
  29. +3 −3 docs/kuberay/{ray-cluster.aviary-gke.yaml → ray-cluster.rayllm-gke.yaml}
  30. +4 −4 docs/kuberay/{ray-service.aviary-eks.yaml → ray-service.rayllm-eks.yaml}
  31. +4 −4 docs/kuberay/{ray-service.aviary-gke.yaml → ray-service.rayllm-gke.yaml}
  32. +2 −2 mkdocs.yml
  33. +103 −12 models/README.md
  34. +43 −0 models/continuous_batching/OpenAssistant--falcon-40b-sft-top1-560.yaml
  35. +44 −0 models/continuous_batching/OpenAssistant--falcon-7b-sft-top1-696.yaml
  36. +1 −1 models/continuous_batching/amazon--LightGPT.yaml
  37. +4 −4 models/continuous_batching/codellama--CodeLlama-34b-Instruct-hf.yaml
  38. +4 −4 models/continuous_batching/meta-llama--Llama-2-13b-chat-hf.yaml
  39. +4 −4 models/continuous_batching/meta-llama--Llama-2-70b-chat-hf.yaml
  40. +4 −4 models/continuous_batching/meta-llama--Llama-2-7b-chat-hf.yaml
  41. +41 −0 models/continuous_batching/mistralai--Mixtral-8-7b-Instruct-v01.yaml
  42. +28 −0 models/continuous_batching/quantization/README.md
  43. +40 −0 models/continuous_batching/quantization/TheBloke--Llama-2-13B-chat-AWQ.yaml
  44. +40 −0 models/continuous_batching/quantization/TheBloke--Llama-2-70B-chat-AWQ.yaml
  45. +42 −0 models/continuous_batching/quantization/TheBloke--Llama-2-7B-chat-AWQ.yaml
  46. +40 −0 models/continuous_batching/quantization/squeeze-ai-lab--sq-llama-2-13b-w4-s0.yaml
  47. +42 −0 models/continuous_batching/quantization/squeeze-ai-lab--sq-llama-2-7b-w4-s0.yaml
  48. +39 −0 models/continuous_batching/trtllm-meta-llama--Llama-2-70b-chat-hf.yaml
  49. +39 −0 models/continuous_batching/trtllm-meta-llama--Llama-2-7b-chat-hf.yaml
  50. +1 −1 pyproject.toml
  51. +6 −0 rayllm/__init__.py
  52. +4 −0 rayllm/backend/__init__.py
  53. 0 {aviary → rayllm}/backend/llm/__init__.py
  54. +15 −0 rayllm/backend/llm/dict_utils.py
  55. 0 {aviary/backend/llm/vllm → rayllm/backend/llm/embedding}/__init__.py
  56. +203 −0 rayllm/backend/llm/embedding/embedding_engine.py
  57. +163 −0 rayllm/backend/llm/embedding/embedding_model_runner.py
  58. +45 −0 rayllm/backend/llm/embedding/embedding_models.py
  59. +3 −3 {aviary → rayllm}/backend/llm/engine/base.py
  60. 0 {aviary → rayllm}/backend/llm/engine/stats.py
  61. +2 −0 {aviary → rayllm}/backend/llm/error_handling.py
  62. +2 −1 {aviary → rayllm}/backend/llm/generation.py
  63. +95 −0 rayllm/backend/llm/llm_node_initializer.py
  64. +1 −1 {aviary → rayllm}/backend/llm/tokenizer.py
  65. 0 {aviary/backend/llm/vllm/metrics → rayllm/backend/llm/trtllm}/__init__.py
  66. +150 −0 rayllm/backend/llm/trtllm/trtllm_engine.py
  67. +157 −0 rayllm/backend/llm/trtllm/trtllm_models.py
  68. +37 −0 rayllm/backend/llm/trtllm/trtllm_mpi.py
  69. +831 −0 rayllm/backend/llm/utils.py
  70. 0 {aviary/backend/observability → rayllm/backend/llm/vllm}/__init__.py
  71. 0 {aviary/backend/server → rayllm/backend/llm/vllm/metrics}/__init__.py
  72. +9 −9 {aviary → rayllm}/backend/llm/vllm/metrics/vllm_compatibility.py
  73. +1 −1 {aviary → rayllm}/backend/llm/vllm/util.py
  74. +57 −33 {aviary → rayllm}/backend/llm/vllm/vllm_compatibility.py
  75. +154 −27 {aviary → rayllm}/backend/llm/vllm/vllm_engine.py
  76. +68 −9 {aviary → rayllm}/backend/llm/vllm/vllm_engine_stats.py
  77. +56 −0 rayllm/backend/llm/vllm/vllm_models.py
  78. +5 −3 {aviary → rayllm}/backend/llm/vllm/vllm_node_initializer.py
  79. 0 {aviary → rayllm}/backend/logger.py
  80. 0 {aviary/backend/server/openai_compat → rayllm/backend/observability}/__init__.py
  81. +1 −1 {aviary → rayllm}/backend/observability/base.py
  82. 0 {aviary → rayllm}/backend/observability/event_loop_monitoring.py
  83. +79 −30 {aviary → rayllm}/backend/observability/fn_call_metrics.py
  84. +1 −1 {aviary → rayllm}/backend/observability/inference_worker_metrics.py
  85. 0 {aviary → rayllm}/backend/observability/loggers.py
  86. 0 {aviary → rayllm}/backend/observability/metrics.py
  87. +1 −1 {aviary → rayllm}/backend/observability/request_context.py
  88. +2 −2 {aviary → rayllm}/backend/observability/telemetry.py
  89. +3 −0 rayllm/backend/observability/tracing/__init__.py
  90. +1 −1 {aviary → rayllm}/backend/observability/tracing/baggage.py
  91. 0 {aviary → rayllm}/backend/observability/tracing/baggage_span_processor.py
  92. 0 {aviary → rayllm}/backend/observability/tracing/context.py
  93. 0 {aviary → rayllm}/backend/observability/tracing/fastapi.py
  94. +7 −5 {aviary → rayllm}/backend/observability/tracing/setup.py
  95. 0 {aviary → rayllm}/backend/observability/tracing/threading.py
  96. +43 −0 rayllm/backend/observability/tracing/threading_propagator.py
  97. 0 {aviary/backend/server/plugins → rayllm/backend/server}/__init__.py
  98. +6 −2 {aviary → rayllm}/backend/server/app.py
  99. +2 −0 rayllm/backend/server/constants.py
  100. 0 {aviary/backend/server/routers → rayllm/backend/server/embedding}/__init__.py
  101. +112 −0 rayllm/backend/server/embedding/embedding_deployment.py
  102. +2 −2 {aviary → rayllm}/backend/server/metrics.py
  103. +313 −34 {aviary → rayllm}/backend/server/models.py
  104. 0 {aviary/backend/server/vllm → rayllm/backend/server/openai_compat}/__init__.py
  105. +28 −0 rayllm/backend/server/openai_compat/openai_exception.py
  106. +28 −0 rayllm/backend/server/openai_compat/openai_middleware.py
  107. 0 {aviary → rayllm}/backend/server/openai_compat/openai_model_util.py
  108. 0 {aviary/common → rayllm/backend/server/plugins}/__init__.py
  109. +89 −0 rayllm/backend/server/plugins/deployment_base_client.py
  110. +27 −7 {aviary → rayllm}/backend/server/plugins/execution_hooks.py
  111. +24 −7 {aviary → rayllm}/backend/server/plugins/multi_query_client.py
  112. +23 −52 {aviary → rayllm}/backend/server/plugins/router_query_engine.py
  113. +13 −19 {aviary → rayllm}/backend/server/plugins/serve_application_query_client.py
  114. 0 {aviary/frontend → rayllm/backend/server/routers}/__init__.py
  115. +14 −0 rayllm/backend/server/routers/middleware.py
  116. +571 −0 rayllm/backend/server/routers/router_app.py
  117. +222 −0 rayllm/backend/server/run.py
  118. 0 {aviary/testing → rayllm/backend/server/trtllm}/__init__.py
  119. +50 −0 rayllm/backend/server/trtllm/trtllm_deployment.py
  120. +89 −28 {aviary → rayllm}/backend/server/utils.py
  121. 0 rayllm/backend/server/vllm/__init__.py
  122. +5 −6 {aviary → rayllm}/backend/server/vllm/vllm_deployment.py
  123. +2 −2 {aviary → rayllm}/cli.py
  124. 0 rayllm/common/__init__.py
  125. +1 −1 {aviary → rayllm}/common/constants.py
  126. 0 {aviary → rayllm}/common/evaluation.py
  127. 0 {aviary → rayllm}/common/llm_event.py
  128. +189 −8 {aviary → rayllm}/common/models.py
  129. +18 −1 {aviary → rayllm}/common/utils.py
  130. +1 −1 {aviary → rayllm}/conf.py
  131. +6 −0 rayllm/env_conf.py
  132. 0 rayllm/frontend/__init__.py
  133. +21 −14 {aviary → rayllm}/frontend/app.py
  134. +3 −3 {aviary → rayllm}/frontend/async_sdk.py
  135. +12 −17 {aviary → rayllm}/frontend/endpoints_sdk.py
  136. 0 {aviary → rayllm}/frontend/javascript/aviary.js
  137. +2 −1 {aviary → rayllm}/frontend/javascript_loader.py
  138. +1 −1 {aviary → rayllm}/frontend/leaderboard.py
  139. +4 −4 {aviary → rayllm}/frontend/mongo_logger.py
  140. 0 {aviary → rayllm}/frontend/mongo_secrets.py
  141. 0 {aviary → rayllm}/frontend/types.py
  142. +3 −3 {aviary → rayllm}/frontend/utils.py
  143. +44 −49 {aviary → rayllm}/sdk.py
  144. 0 rayllm/testing/__init__.py
  145. +26 −11 {aviary → rayllm}/testing/mock_deployment.py
  146. +4 −4 {aviary → rayllm}/testing/mock_run.py
  147. +57 −10 {aviary → rayllm}/testing/mock_vllm_engine.py
  148. +3 −7 requirements-backend.txt
  149. +3 −3 requirements-dev.txt
  150. +7 −0 serve_configs/OpenAssistant--falcon-40b-sft-top1-560.yaml
  151. +7 −0 serve_configs/OpenAssistant--falcon-7b-sft-top1-696.yaml
  152. +7 −0 serve_configs/TheBloke--Llama-2-13B-chat-AWQ.yaml
  153. +7 −0 serve_configs/TheBloke--Llama-2-70B-chat-AWQ.yaml
  154. +7 −0 serve_configs/TheBloke--Llama-2-7B-chat-AWQ.yaml
  155. +3 −3 serve_configs/amazon--LightGPT.yaml
  156. +3 −3 serve_configs/codellama--CodeLlama-34b-Instruct-hf.yaml
  157. +3 −3 serve_configs/meta-llama--Llama-2-13b-chat-hf.yaml
  158. +3 −3 serve_configs/meta-llama--Llama-2-70b-chat-hf.yaml
  159. +3 −3 serve_configs/meta-llama--Llama-2-7b-chat-hf.yaml
  160. +7 −0 serve_configs/mistralai--Mixtral-8-7b-Instruct-v01.yaml
  161. +7 −0 serve_configs/squeeze-ai-lab--sq-llama-2-13b-w4-s0.yaml
  162. +7 −0 serve_configs/squeeze-ai-lab--sq-llama-2-7b-w4-s0.yaml
  163. +7 −0 serve_configs/thenlper--gte-large.yaml
  164. +7 −0 serve_configs/trtllm-meta-llama--Llama-2-70b-chat-hf.yaml
  165. +7 −0 serve_configs/trtllm-meta-llama--Llama-2-7b-chat-hf.yaml
  166. +5 −5 setup.py
  167. +23 −8 tests/conftest.py
  168. +1 −1 tests/integration/test_cli.py
  169. +1 −1 tests/integration/test_frontend.py
  170. +1 −1 tests/integration/test_openai_compatibility.py
  171. +1 −1 tests/integration/test_sdk.py
  172. +3 −10 tests/test_aviary/backend/conftest.py
  173. +44 −2 tests/test_aviary/backend/llm/test_utils.py
  174. +122 −0 tests/test_aviary/backend/observability/test_fn_call_metrics.py
  175. +1 −1 tests/test_aviary/backend/observability/test_request_context.py
  176. +4 −4 tests/test_aviary/backend/server/plugins/test_multi_query_client.py
  177. +10 −10 tests/test_aviary/backend/server/plugins/test_router_query_engine.py
  178. +0 −6 tests/test_aviary/backend/server/test_metrics.py
  179. +2 −4 tests/test_aviary/backend/server/test_models.py
  180. +3 −3 tests/test_aviary/backend/server/test_router.py
  181. +2 −2 tests/test_aviary/backend/server/test_run.py
  182. +2 −2 tests/test_aviary/backend/server/test_task_set.py
  183. +2 −2 tests/test_aviary/backend/server/test_utils.py
  184. +99 −1 tests/test_aviary/common/test_prompt_format.py
5 changes: 1 addition & 4 deletions .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,9 @@
name: 🦜🔍 Documentation build and deploy
name: 🦜🔍 Documentation deploy

on:
push:
branches:
- master
pull_request:
branches:
- master

permissions:
contents: write
26 changes: 26 additions & 0 deletions .github/workflows/docs_build.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: 🦜🔍 Documentation build
on:
pull_request:
branches:
- master

permissions:
contents: write

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: 3.x
- run: echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV
- uses: actions/cache@v3
with:
key: mkdocs-material-${{ env.cache_id }}
path: .cache
restore-keys: |
mkdocs-material-
- run: pip install mkdocs-material
- run: mkdocs build
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -232,6 +232,7 @@ tag-mapping.json
*.tmp
deploy/anyscale/service.yaml
out
temp.py

# build output
build/
@@ -248,3 +249,7 @@ prompts.txt
site/

*.orig

__pycache__

.secretenv.yml
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -19,7 +19,7 @@ repos:
hooks:
- id: mypy
# NOTE: Exclusions are handled in pyproject.toml
files: aviary
files: rayllm
exclude: tests
additional_dependencies:
- mypy-extensions
13 changes: 10 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# syntax=docker/dockerfile:1.4
# Note: TRTLLM backend is not included in the dockerfile, it is planned to be added in the future.

ARG RAY_IMAGE="anyscale/ray"
ARG RAY_TAG="2.7.0oss-py39-cu118"
ARG RAY_TAG="2.9.0-py39-cu121"

# Use Anyscale base image
FROM ${RAY_IMAGE}:${RAY_TAG} AS aviary
@@ -16,18 +17,24 @@ ARG RAY_GID=100
ENV RAY_SERVE_ENABLE_NEW_HANDLE_API=1
ENV RAY_SERVE_ENABLE_EXPERIMENTAL_STREAMING=1
ENV RAY_SERVE_ENABLE_JSON_LOGGING=1
ENV RAY_SERVE_PROXY_PREFER_LOCAL_NODE_ROUTING=1
ENV RAY_SERVE_HTTP_KEEP_ALIVE_TIMEOUT_S=310
ENV RAY_metrics_report_batch_size=400

ENV FORCE_CUDA=1
ENV HF_HUB_ENABLE_HF_TRANSFER=1
ENV SAFETENSORS_FAST_GPU=1
ENV LD_LIBRARY_PATH=/usr/local/tensorrt/lib:$LD_LIBRARY_PATH
ENV OMPI_ALLOW_RUN_AS_ROOT=1
ENV OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1

# Remove this line if we need the CUDA packages
# and NVIDIA fixes their repository #ir-gleaming-sky
RUN sudo rm -v /etc/apt/sources.list.d/cuda.list

# Install torch first
RUN pip install --no-cache-dir -U pip \
&& pip install --no-cache-dir -i https://download.pytorch.org/whl/cu118 torch torchvision torchaudio \
&& pip install --no-cache-dir -i https://download.pytorch.org/whl/cu121 torch~=2.1.0 torchvision torchaudio \
&& pip install --no-cache-dir tensorboard ninja

# The build context should be the root of the repo
@@ -40,7 +47,7 @@ COPY --chown=${RAY_UID}:${RAY_GID} "./models/README.md" "${RAY_MODELS_DIR}/READM
RUN cd "${RAY_DIST_DIR}" \
# Update accelerate so transformers doesn't complain.
&& pip install --no-cache-dir -U accelerate \
&& pip install --no-cache-dir -U "$(ls aviary-*.whl | head -n1)[frontend,backend]" \
&& pip install --no-cache-dir -U "$(ls rayllm-*.whl | head -n1)[frontend,backend]" \
# Purge caches
&& pip cache purge || true \
&& conda clean -a \
4 changes: 2 additions & 2 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
include README.md README.ipynb LICENSE *.sh
include README.md LICENSE *.sh
recursive-include tests *.py
recursive-include models *.yaml
recursive-include examples *.*
recursive-include aviary/frontend *.js
recursive-include rayllm/frontend *.js
67 changes: 24 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,16 @@
============================
# Archiving Ray LLM

We had started RayLLM to simplify setting up and deploying LLMs on top of Ray Serve. In the past few months, vLLM has made significant improvements in ease of use. We are archiving the RayLLM project and instead adding some examples to our [Ray Serve docs](https://docs.ray.io/en/master/serve/tutorials/vllm-example.html) for deploying LLMs with Ray Serve and vLLM. This will reduce another library for the community to learn about and greatly simplify the workflow to serve LLMs at scale. We also recently launched [Hosted Anyscale](https://www.anyscale.com/) where you can serve LLMs with Ray Serve with some more capabilities out of the box like multi-lora with serve multiplexing, JSON mode function calling and further performance enhancements.


============================
# RayLLM - LLMs on Ray

[![Build status](https://badge.buildkite.com/d6d7af987d1db222827099a953410c4e212b32e8199ca513be.svg?branch=master)](https://buildkite.com/anyscale/aviary-docker)
The hosted Aviary Explorer is not available anymore.
Visit [Anyscale](https://endpoints.anyscale.com) to experience models served with RayLLM.

Try it now: [🦜🔍 Ray Aviary Explorer 🦜🔍](http://aviary.anyscale.com/)
[![Build status](https://badge.buildkite.com/d6d7af987d1db222827099a953410c4e212b32e8199ca513be.svg?branch=master)](https://buildkite.com/anyscale/aviary-docker)

RayLLM (formerly known as Aviary) is an LLM serving solution that makes it easy to deploy and manage
a variety of open source LLMs, built on [Ray Serve](https://docs.ray.io/en/latest/serve/index.html). It does this by:
@@ -15,10 +23,11 @@ a variety of open source LLMs, built on [Ray Serve](https://docs.ray.io/en/lates
- Fully supporting multi-GPU & multi-node model deployments.
- Offering high performance features like continuous batching, quantization and streaming.
- Providing a REST API that is similar to OpenAI's to make it easy to migrate and cross test them.
- Supporting multiple LLM backends out of the box, including [vLLM](https://github.com/vllm-project/vllm) and [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM).

In addition to LLM serving, it also includes a CLI and a web frontend (Aviary Explorer) that you can use to compare the outputs of different models directly, rank them by quality, get a cost and latency estimate, and more.

RayLLM supports continuous batching by integrating with [vLLM](https://github.com/vllm-project/vllm). Continuous batching allows you to get much better throughput and latency than static batching.
RayLLM supports continuous batching and quantization by integrating with [vLLM](https://github.com/vllm-project/vllm). Continuous batching allows you to get much better throughput and latency than static batching. Quantization allows you to deploy compressed models with cheaper hardware requirements and lower inference costs. See [quantization guide](models/continuous_batching/quantization/README.md) for more details on running quantized models on RayLLM.

RayLLM leverages [Ray Serve](https://docs.ray.io/en/latest/serve/index.html), which has native support for autoscaling
and multi-node deployments. RayLLM can scale to zero and create
@@ -32,14 +41,14 @@ The guide below walks you through the steps required for deployment of RayLLM on

### Locally

We highly recommend using the official `anyscale/aviary` Docker image to run RayLLM. Manually installing RayLLM is currently not a supported use-case due to specific dependencies required, some of which are not available on pip.
We highly recommend using the official `anyscale/ray-llm` Docker image to run RayLLM. Manually installing RayLLM is currently not a supported use-case due to specific dependencies required, some of which are not available on pip.

```shell
cache_dir=${XDG_CACHE_HOME:-$HOME/.cache}

docker run -it --gpus all --shm-size 1g -p 8000:8000 -e HF_HOME=~/data -v $cache_dir:~/data anyscale/aviary:latest bash
docker run -it --gpus all --shm-size 1g -p 8000:8000 -e HF_HOME=~/data -v $cache_dir:~/data anyscale/ray-llm:latest bash
# Inside docker container
aviary run --model ~/models/continuous_batching/amazon--LightGPT.yaml
serve run ~/serve_configs/amazon--LightGPT.yaml
```

### On a Ray Cluster
@@ -57,7 +66,7 @@ export AWS_SESSION_TOKEN=...

Start by cloning this repo to your local machine.

You may need to specify your AWS private key in the `deploy/ray/aviary-cluster.yaml` file.
You may need to specify your AWS private key in the `deploy/ray/rayllm-cluster.yaml` file.
See [Ray on Cloud VMs](https://docs.ray.io/en/latest/cluster/vms/index.html) page in
Ray documentation for more details.

@@ -66,14 +75,14 @@ git clone https://github.com/ray-project/ray-llm.git
cd ray-llm

# Start a Ray Cluster (This will take a few minutes to start-up)
ray up deploy/ray/aviary-cluster.yaml
ray up deploy/ray/rayllm-cluster.yaml
```

#### Connect to your Cluster

```shell
# Connect to the Head node of your Ray Cluster (This will take several minutes to autoscale)
ray attach deploy/ray/aviary-cluster.yaml
ray attach deploy/ray/rayllm-cluster.yaml

# Deploy the LightGPT model.
serve run serve_configs/amazon--LightGPT.yaml
@@ -84,14 +93,14 @@ or define your own model YAML file and run that instead.

### On Kubernetes

For Kubernetes deployments, please see our extensive documentation for [deploying Ray Serve on KubeRay](https://docs.ray.io/en/latest/serve/production-guide/kubernetes.html).
For Kubernetes deployments, please see our documentation for [deploying on KubeRay](https://github.com/ray-project/ray-llm/tree/master/docs/kuberay).

## Query your models

Once the models are deployed, you can install a client outside of the Docker container to query the backend.

```shell
pip install "aviary @ git+https://github.com/ray-project/ray-llm.git"
pip install "rayllm @ git+https://github.com/ray-project/ray-llm.git"
```

You can query your RayLLM deployment in many ways.
@@ -219,47 +228,19 @@ print(chat_completion)
To install RayLLM and its dependencies, run the following command:

```shell
pip install "aviary @ git+https://github.com/ray-project/ray-llm.git"
pip install "rayllm @ git+https://github.com/ray-project/ray-llm.git"
```

RayLLM consists of a set of configurations and utilities for deploying LLMs on Ray Serve,
in addition to a frontend (Aviary Explorer), both of which come with additional
dependencies. To install the dependencies for the frontend run the following commands:

```shell
pip install "aviary[frontend] @ git+https://github.com/ray-project/ray-llm.git"
pip install "rayllm[frontend] @ git+https://github.com/ray-project/ray-llm.git"
```

The backend dependencies are heavy weight, and quite large. We recommend using the official
`anyscale/aviary` image. Installing the backend manually is not a supported usecase.

## Running Aviary Explorer locally

The frontend is a [Gradio](https://gradio.app/) interface that allows you to interact
with the models in the backend through a web interface.
The Gradio app is served using [Ray Serve](https://docs.ray.io/en/latest/serve/index.html).

To run the Aviary Explorer locally, you need to set the following environment variable:

```shell
export ENDPOINT_URL=<hostname of the backend, eg. 'http://localhost:8000'>
```

Once you have set these environment variables, you can run the frontend with the
following command:

```shell
serve run aviary.frontend.app:app --non-blocking
```

You will be able to access it at `http://localhost:8000/frontend` in your browser.

To just use the Gradio frontend without Ray Serve, you can start it
with `python aviary/frontend/app.py`. In that case, the Gradio interface should be accessible at `http://localhost:7860` in your browser.
If running the frontend yourself is not an option, you can still use
[our hosted version](http://aviary.anyscale.com/) for your experiments.

Note that the frontend will not dynamically update the list of models should they change in the backend. In order for the frontend to update, you will need to restart it.
`anyscale/ray-llm` image. Installing the backend manually is not a supported usecase.

### Usage stats collection

@@ -307,7 +288,7 @@ Run multiple models at once by aggregating the Serve configs for different model

applications:
- name: router
import_path: aviary.backend:router_application
import_path: rayllm.backend:router_application
route_prefix: /
args:
models:
6 changes: 0 additions & 6 deletions aviary/__init__.py

This file was deleted.

4 changes: 0 additions & 4 deletions aviary/backend/__init__.py

This file was deleted.

14 changes: 0 additions & 14 deletions aviary/backend/llm/dict_utils.py

This file was deleted.

Loading