-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support nova-3 and keyterms #277
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe changes update transcription configuration across examples and internal client code. In multiple files the transcription model has been updated to "nova-3" (replacing "nova-2" where applicable) and a new Keyterms field has been introduced—populated with either "deepgram" or "Bueller" as appropriate. In addition, the client code now includes validation checks that enforce the requirement that keyterms are only used when the model starts with "nova-3". Some older functions have been deprecated in favor of their new callback-based implementations. Changes
Sequence Diagram(s)sequenceDiagram
participant C as Client
participant F as Transcription Function
participant V as Validator
participant L as Logger
C->>F: Invoke transcription API with options
F->>V: Validate Model and Keyterms
alt Options Valid
V-->>F: Validation passed
F->>C: Proceed with API request
else Options Invalid
V-->>F: Validation failed
F->>L: Log warning ("Keyterms only supported with nova-3")
F->>C: Return nil response/error
end
Possibly related PRs
Suggested reviewers
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
863b652
to
a51351d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (11)
examples/speech-to-text/rest/intent/main.go (1)
35-36
: Add documentation for keyterms usage with nova-3.Consider adding a comment explaining how keyterms enhance the intent detection feature when used with nova-3.
Add this documentation above the options:
// set the Transcription options +// Note: Keyterms are only supported with nova-3 model and can enhance intent detection options := &interfaces.PreRecordedTranscriptionOptions{
examples/speech-to-text/rest/sentiment/main.go (1)
50-50
: Improve error message clarity.The error message "FromStream failed" is inconsistent with the actual method name "FromFile".
Update the error message:
-fmt.Printf("FromStream failed. Err: %v\n", err) +fmt.Printf("FromFile failed. Err: %v\n", err)examples/speech-to-text/rest/topic/main.go (1)
35-42
: Consider centralizing nova-3 configuration.Since nova-3 and keyterms configuration is consistent across examples, consider creating a helper function or default configuration to reduce duplication and ensure consistency.
Example implementation:
// pkg/client/listen/defaults.go func NewNova3Options() *interfaces.PreRecordedTranscriptionOptions { return &interfaces.PreRecordedTranscriptionOptions{ Model: "nova-3", Keyterms: []string{"deepgram"}, // Add other common defaults } }examples/speech-to-text/rest/summary/main.go (1)
34-41
: LGTM! Consider adding a comment explaining keyterms usage.The changes correctly implement nova-3 model and keyterms support. The configuration looks good with appropriate options for summarization.
Consider adding a comment explaining the purpose of keyterms for better developer experience:
options := &interfaces.PreRecordedTranscriptionOptions{ Model: "nova-3", + // Keyterms improve word recognition for specific terms Keyterms: []string{"deepgram"}, Punctuate: true, Language: "en-US", SmartFormat: true, Summarize: "v2", }
examples/speech-to-text/rest/url/main.go (1)
32-40
: Consider aligning keyterms with audio content.While the nova-3 model is correctly configured, the keyterm "deepgram" doesn't match the content of the audio file (Bueller). For better demonstration of keyterms functionality, consider using "Bueller" as the keyterm.
options := &interfaces.PreRecordedTranscriptionOptions{ Model: "nova-3", - Keyterms: []string{"deepgram"}, + Keyterms: []string{"Bueller"}, Punctuate: true, Diarize: true, Language: "en-US", Utterances: true, Redact: []string{"pci", "ssn"}, }examples/speech-to-text/websocket/http_callback/main.go (1)
34-39
: Consider dynamic keyterms for live streams.While the nova-3 model is correctly configured, using a static keyterm "deepgram" for a live news stream might not be optimal. Consider:
- Using more relevant keyterms for news content
- Making keyterms configurable via command-line flags
Example implementation with flags:
+import "flag" func main() { + // Add command-line flags + keytermsFlag := flag.String("keyterms", "news,breaking,headlines", "Comma-separated keyterms") + flag.Parse() + + // Parse keyterms + keyterms := strings.Split(*keytermsFlag, ",") + transcriptOptions := &interfaces.LiveTranscriptionOptions{ Model: "nova-3", - Keyterms: []string{"deepgram"}, + Keyterms: keyterms, Language: "en-US", Punctuate: true, }pkg/client/interfaces/v1/types-agent.go (1)
37-40
: Add documentation for the Listen struct.The Listen struct is missing documentation comments that describe its purpose and the usage of its fields, particularly the new Keyterms field.
Add documentation above the struct:
+// Listen contains configuration for the speech recognition component +// Model specifies the model to use for transcription (e.g., "nova-3") +// Keyterms provides a list of key terms to enhance recognition accuracy type Listen struct { Model string `json:"model,omitempty"` Keyterms []string `json:"keyterms,omitempty"` }pkg/client/interfaces/v1/types-stream.go (1)
28-28
: Add documentation for the Keyterms field.Please add a comment explaining the purpose and usage of the Keyterms field, similar to how other fields are documented (e.g., the Dictation field on line 21).
pkg/client/interfaces/v1/types-prerecorded.go (1)
34-34
: Add documentation for the Keyterms field.Please add a comment explaining the purpose and usage of the Keyterms field, similar to how other fields are documented.
pkg/client/agent/v1/websocket/new_using_chan.go (1)
73-76
: Consider extracting the keyterms validation logic.The keyterms validation logic is duplicated across both websocket client files. Consider extracting it into a common utility function to maintain DRY principles and ensure consistent validation.
Create a new function in a common package:
// pkg/client/common/v1/validation.go func ValidateKeytermsModel(keyterms []string, model string) error { if len(keyterms) > 0 && !strings.HasPrefix(model, "nova-3") { return fmt.Errorf("keyterms are only supported with nova-3 models, got model: %s", model) } return nil }Then use it in both websocket clients:
-if len(tOptions.Agent.Listen.Keyterms) > 0 && !strings.HasPrefix(tOptions.Agent.Listen.Model, "nova-3") { - klog.V(1).Info("Keyterms are only supported with nova-3 models.") - return nil, nil +if err := common.ValidateKeytermsModel(tOptions.Agent.Listen.Keyterms, tOptions.Agent.Listen.Model); err != nil { + klog.V(1).Info(err) + return nil, err }examples/agent/websocket/simple/main.go (1)
396-397
: LGTM! Consider adding a comment explaining keyterms.The nova-3 model and keyterms configuration look good. Consider adding a comment explaining the purpose and behavior of keyterms for better developer experience.
tOptions.Agent.Think.Instructions = "You are a helpful AI assistant." + // Configure the agent to listen using nova-3 model tOptions.Agent.Listen.Model = "nova-3" + // Configure keyterms to trigger specific agent behavior when detected tOptions.Agent.Listen.Keyterms = []string{"Bueller"}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
examples/agent/websocket/simple/main.go
(1 hunks)examples/speech-to-text/rest/callback/callback/main.go
(1 hunks)examples/speech-to-text/rest/file/main.go
(1 hunks)examples/speech-to-text/rest/intent/main.go
(1 hunks)examples/speech-to-text/rest/sentiment/main.go
(1 hunks)examples/speech-to-text/rest/stream/main.go
(1 hunks)examples/speech-to-text/rest/summary/main.go
(1 hunks)examples/speech-to-text/rest/topic/main.go
(1 hunks)examples/speech-to-text/rest/url/main.go
(1 hunks)examples/speech-to-text/websocket/http_callback/main.go
(1 hunks)examples/speech-to-text/websocket/http_channel/main.go
(1 hunks)examples/speech-to-text/websocket/microphone_callback/main.go
(1 hunks)examples/speech-to-text/websocket/microphone_channel/main.go
(1 hunks)examples/speech-to-text/websocket/replay/main.go
(1 hunks)pkg/client/agent/v1/websocket/new_using_chan.go
(2 hunks)pkg/client/interfaces/v1/types-agent.go
(1 hunks)pkg/client/interfaces/v1/types-prerecorded.go
(1 hunks)pkg/client/interfaces/v1/types-stream.go
(1 hunks)pkg/client/listen/v1/rest/client.go
(3 hunks)pkg/client/listen/v1/websocket/new_using_callbacks.go
(2 hunks)pkg/client/listen/v1/websocket/new_using_chan.go
(2 hunks)
🔇 Additional comments (10)
examples/speech-to-text/rest/callback/callback/main.go (1)
41-42
: Add validation for nova-3 keyterms requirement.The example should validate that keyterms are only used with nova-3 model to prevent runtime errors.
Run this script to check if validation is implemented in the client:
examples/speech-to-text/rest/topic/main.go (1)
50-50
: Fix incorrect error message.The error message "FromStream failed" is inconsistent with the actual method name "FromFile".
Please refer to the similar comment in the sentiment example for the suggested fix.
examples/speech-to-text/rest/stream/main.go (1)
34-41
: LGTM! Good alignment of keyterms with content.The changes correctly implement nova-3 model and the keyterm "Bueller" appropriately matches the content of the audio file.
examples/speech-to-text/websocket/replay/main.go (1)
31-39
: LGTM! Good example of nova-3 and keyterms usage.The changes correctly demonstrate how to use the nova-3 model and keyterms in a websocket streaming context.
examples/speech-to-text/websocket/http_channel/main.go (1)
38-43
: LGTM! Consistent with other examples.The changes correctly demonstrate nova-3 and keyterms usage in an HTTP streaming context.
examples/speech-to-text/rest/file/main.go (2)
34-42
: LGTM! Good example of contextual keyterm usage.The example demonstrates good practice by using a keyterm ("Bueller") that matches the content of the input audio file, making it a meaningful example.
34-42
: Verify model validation for keyterms usage.Let's verify that the codebase includes validation to ensure keyterms are only used with the nova-3 model.
✅ Verification successful
🏁 Script executed:
#!/bin/bash # Search for model validation in the codebase rg -A 5 "keyterms.*nova-3|nova-3.*keyterms"Length of output: 45
🏁 Script executed:
#!/bin/bash echo "Searching for occurrences of PreRecordedTranscriptionOptions:" rg -n "PreRecordedTranscriptionOptions" . echo "Searching for potential validation logic combining keyterms and model:" rg -A 10 -B 10 'if.*Options\.Keyterms' .Length of output: 10308
🏁 Script executed:
#!/bin/bash # Inspect the Check() implementation for PreRecordedTranscriptionOptions to verify model-keyterms validation rg -n -A 30 -B 10 'func \(o \*PreRecordedTranscriptionOptions\) Check() error' ./pkg/client/interfaces/v1/options.goLength of output: 118
🏁 Script executed:
#!/bin/bash echo "Searching for any Check() implementation across the repository:" rg -n "func .*Check() error" .Length of output: 169
🏁 Script executed:
#!/bin/bash # Search for the explicit log message that confirms keyterms validation with nova-3 rg -n "Keyterms are only supported with nova-3 models" .Length of output: 647
Keyterms validation confirmed for non‑nova-3 models.
The codebase includes explicit checks that log a message and abort processing when keyterms are provided with a model other than nova‑3. For example, in both the websocket and REST client implementations (see pkg/client/listen/v1/websocket/new_using_callbacks.go, pkg/client/listen/v1/websocket/new_using_chan.go, and pkg/client/listen/v1/rest/client.go), the code verifies that if keyterms are set and the model does not start with “nova‑3”, then keyterms are rejected (with a log message stating “Keyterms are only supported with nova‑3 models.”). Since the example in file examples/speech-to-text/rest/file/main.go uses the nova‑3 model, it follows the validation rules.
examples/speech-to-text/websocket/microphone_callback/main.go (1)
128-129
: LGTM! Model and keyterms configuration added.The changes correctly update the model to "nova-3" and add keyterms support, aligning with the PR objectives.
examples/speech-to-text/websocket/microphone_channel/main.go (1)
227-228
: LGTM! Model and keyterms configuration added.The changes correctly update the model to "nova-3" and add keyterms support, aligning with the PR objectives.
examples/agent/websocket/simple/main.go (1)
393-395
: Verify the think model configuration.The think configuration uses "gpt-4o-mini" as the model. Please verify if this is the correct model name.
✅ Verification successful
🌐 Web query:
Is "gpt-4o-mini" a valid OpenAI model name?
💡 Result:
Yes, "gpt-4o-mini" is a valid OpenAI model name. Here’s a breakdown of key details: ### Official Model Recognition 1. **OpenAI’s official announcement** explicitly names it as their most cost-efficient small model for text/vision tasks[1][7]. 2. **Azure AI integration** confirms availability on Microsoft’s platform with technical documentation listing `gpt-4o-mini` under valid model names[7][10]. ### Key Features - **Pricing**: 15¢/million input tokens and 60¢/million output tokens[1][2]. - **Capabilities**: - Supports text and image inputs (with audio/video coming later)[1][10]. - 128K token context window and improved non-English text handling[1][2]. - Scores **82% on MMLU**, outperforming GPT-3.5 Turbo in chat preferences[1][4]. ### Naming Convention Context OpenAI’s guidelines classify it under **flagship models** alongside GPT-4o[2]. Third-party platforms like OpenRouter.ai also validate its legitimacy[4]. ### Known Issues Some users reported errors like *“Unknown model ‘gpt-4o-mini’”* due to: - Outdated API packages or deployment delays on Azure[3][9]. - Temporary compatibility issues with tools like Langflow[6]. For reliable usage, ensure your API client is updated and refer to OpenAI’s latest documentation[1][10]. Citations: - 1: https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/ - 2: https://www.restack.io/p/openai-answer-model-naming-guidelines-cat-ai - 3: https://community.llamaindex.ai/azure-openai-gpt4omini-support-unknown-model-NvRxSRHy739R - 4: https://openrouter.ai/openai/gpt-4o-mini - 5: https://spyglass.org/strawberry-openai-o1/ - 6: https://community.openai.com/t/unknown-model-gpt-4o-mini/946600 - 7: https://azure.microsoft.com/en-us/blog/openais-fastest-model-gpt-4o-mini-is-now-available-on-azure-ai/ - 8: https://community.openai.com/t/naming-convention-for-gpt-4-0/1096783 - 9: https://community.openai.com/t/gpt-4o-mini-not-working-after-package-update/876511 - 10: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models?tabs=python-secure%2Cglobal-standard%2Cstandard-chat-completions&wt.mc_id=studentamb_264449
Think Model Configuration Verified
The web query confirms that "gpt-4o-mini" is a valid OpenAI model name, so no changes are required.
if len(tOptions.Keyterms) > 0 && !strings.HasPrefix(tOptions.Model, "nova-3") { | ||
klog.V(1).Info("Keyterms are only supported with nova-3 models.") | ||
return nil, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve error handling for keyterms validation.
The current implementation returns nil, nil
when keyterms validation fails, which makes it unclear whether the operation succeeded or failed. Consider returning a proper error to indicate the validation failure.
if len(tOptions.Keyterms) > 0 && !strings.HasPrefix(tOptions.Model, "nova-3") {
klog.V(1).Info("Keyterms are only supported with nova-3 models.")
- return nil, nil
+ return nil, fmt.Errorf("keyterms are only supported with nova-3 models, got model: %s", tOptions.Model)
}
if len(tOptions.Agent.Listen.Keyterms) > 0 && !strings.HasPrefix(tOptions.Agent.Listen.Model, "nova-3") { | ||
klog.V(1).Info("Keyterms are only supported with nova-3 models.") | ||
return nil, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve error handling for keyterms validation.
The current implementation returns nil, nil
when keyterms validation fails, which makes it unclear whether the operation succeeded or failed. Consider returning a proper error to indicate the validation failure.
if len(tOptions.Agent.Listen.Keyterms) > 0 && !strings.HasPrefix(tOptions.Agent.Listen.Model, "nova-3") {
klog.V(1).Info("Keyterms are only supported with nova-3 models.")
- return nil, nil
+ return nil, fmt.Errorf("keyterms are only supported with nova-3 models, got model: %s", tOptions.Agent.Listen.Model)
}
if len(tOptions.Keyterms) > 0 && !strings.HasPrefix(tOptions.Model, "nova-3") { | ||
klog.V(1).Info("Keyterms are only supported with nova-3 models.") | ||
return nil, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Return a descriptive error instead of nil.
While the validation logic is correct, returning nil, nil
might be unclear to callers. Consider returning a descriptive error to better communicate why the operation failed.
Apply this diff to improve error handling:
if len(tOptions.Keyterms) > 0 && !strings.HasPrefix(tOptions.Model, "nova-3") {
klog.V(1).Info("Keyterms are only supported with nova-3 models.")
- return nil, nil
+ return nil, fmt.Errorf("keyterms are only supported with nova-3 models, got model: %s", tOptions.Model)
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if len(tOptions.Keyterms) > 0 && !strings.HasPrefix(tOptions.Model, "nova-3") { | |
klog.V(1).Info("Keyterms are only supported with nova-3 models.") | |
return nil, nil | |
} | |
if len(tOptions.Keyterms) > 0 && !strings.HasPrefix(tOptions.Model, "nova-3") { | |
klog.V(1).Info("Keyterms are only supported with nova-3 models.") | |
return nil, fmt.Errorf("keyterms are only supported with nova-3 models, got model: %s", tOptions.Model) | |
} |
if len(req.Keyterms) > 0 && !strings.HasPrefix(req.Model, "nova-3") { | ||
klog.V(1).Info("Keyterms are only supported with nova-3 models.") | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Return a descriptive error in DoFile.
Return a descriptive error to better communicate why the operation failed.
Apply this diff to improve error handling:
if len(req.Keyterms) > 0 && !strings.HasPrefix(req.Model, "nova-3") {
klog.V(1).Info("Keyterms are only supported with nova-3 models.")
- return nil
+ return fmt.Errorf("keyterms are only supported with nova-3 models, got model: %s", req.Model)
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if len(req.Keyterms) > 0 && !strings.HasPrefix(req.Model, "nova-3") { | |
klog.V(1).Info("Keyterms are only supported with nova-3 models.") | |
return nil | |
} | |
if len(req.Keyterms) > 0 && !strings.HasPrefix(req.Model, "nova-3") { | |
klog.V(1).Info("Keyterms are only supported with nova-3 models.") | |
return fmt.Errorf("keyterms are only supported with nova-3 models, got model: %s", req.Model) | |
} |
if len(options.Keyterms) > 0 && !strings.HasPrefix(options.Model, "nova-3") { | ||
klog.V(1).Info("Keyterms are only supported with nova-3 models.") | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Return a descriptive error in DoStream.
Return a descriptive error to better communicate why the operation failed.
Apply this diff to improve error handling:
if len(options.Keyterms) > 0 && !strings.HasPrefix(options.Model, "nova-3") {
klog.V(1).Info("Keyterms are only supported with nova-3 models.")
- return nil
+ return fmt.Errorf("keyterms are only supported with nova-3 models, got model: %s", options.Model)
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if len(options.Keyterms) > 0 && !strings.HasPrefix(options.Model, "nova-3") { | |
klog.V(1).Info("Keyterms are only supported with nova-3 models.") | |
return nil | |
} | |
if len(options.Keyterms) > 0 && !strings.HasPrefix(options.Model, "nova-3") { | |
klog.V(1).Info("Keyterms are only supported with nova-3 models.") | |
return fmt.Errorf("keyterms are only supported with nova-3 models, got model: %s", options.Model) | |
} |
Proposed changes
Types of changes
What types of changes does your code introduce to the community Go SDK?
Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.Further comments
Summary by CodeRabbit
New Features
Bug Fixes