Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Streaming Support for Model Invocations #675

Open
bannawandoor27 opened this issue Jan 4, 2025 · 1 comment
Open

Feature Request: Streaming Support for Model Invocations #675

bannawandoor27 opened this issue Jan 4, 2025 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@bannawandoor27
Copy link

Description:

I kindly request the implementation of streaming support for model invocations in the Modus SDK. This feature is crucial for real-time applications that require incremental responses.

Expected Changes:

  • Introduce a stream: true option in the ChatModelInput configuration.
  • Provide a mechanism to handle tokens as they are received.

Benefits:

  • Reduces latency for real-time applications.
  • Enhances user experience by providing immediate feedback.

Thank you for considering this request.

@mattjohnsonpint
Copy link
Member

Hi. Thanks for the feature request. We've looked into this a bit already, and do intend to add this capability. Though I cannot give you a precise timing of when it will be available, it is on our roadmap. I'll keep this issue open for now, so you and others can provide feedback.

To provide some background information, there are multiple parts to supporting this feature:

  1. Currently, outbound HTTP request bodies and resulting HTTP response bodies are synchronously passed into WASM memory in their entirety (as byte arrays). The entire request/response is done in a single host function call. It would be a considerable effort to implement true streaming for our HTTP API in general. At some future point, we'll be able to take advantage of WASI-HTTP, but that's not currently available in our upstream dependencies today.

    A more likely interim state is that we provide a different HTTP API in our SDKs that is designed specifically for Server-sent Events (SSE) APIs, such as OpenAI uses for its streaming results. It would have a parameter that was a function callback, that would continuously receive individual SSE messages while the connection is open. That would allow you to have custom code to respond to each event message in real-time. It would also include a way to terminate the current connection in response to an event, if desired.

  2. Modus currently offers only GraphQL endpoint types, and currently exposes only Query and Mutation root types. Thus, even if you had discrete events streamed from a model, we'd have no way to return any result until the entirety of the response was ready.

    One way we could handle streaming responses is to support the @defer and @stream GraphQL directives in queries. However, these are still experimental and not standardized in GraphQL (as far as I am aware).

    A more available solution would be to implement support for GraphQL Subscription operations using SSE for the transport mechanism. One would subscribe to a particular function, and that function would have a mechanism to emit events without exiting. The caller would need to handle those events - received in a GraphQL compliant SSE stream.

  3. Once we have the above two items worked out, there'd still be some work to use these features in the models APIs of our SDK. This part shouldn't be too difficult though.

I'd be very interested in collecting some use cases. There are two I can think of:

  • Aborting an LLM response mid-way through the response, such as when the result is a hallucination or not conforming to expected output.

  • Delivering the results of the LLM response while they are being generated, such as with the typing animations one sees with ChatGPT and other online AI assistants.

Are there any others you can think of?

@mattjohnsonpint mattjohnsonpint added the enhancement New feature or request label Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

2 participants