-
-
Notifications
You must be signed in to change notification settings - Fork 642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create new partitioned TaskEvent table, and switch to it gradually as new runs are created #1696
Conversation
… new runs are created
|
WalkthroughThis pull request removes the span link rendering functionality from several components by deleting the Changes
Sequence Diagram(s)sequenceDiagram
participant Service as Service/Presenter
participant Resolver as getTaskEventStoreTableForRun
participant Repo as EventRepository
participant DB as Database
Service->>Resolver: Determine event store table for run
Resolver-->>Service: Return table name
Service->>Repo: Call updated query/completion with (table, runId, createdAt, completedAt)
Repo->>DB: Execute query/command on the specified table
DB-->>Repo: Return events/result
Repo-->>Service: Return processed data
Possibly related PRs
Poem
Warning There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure. 🔧 ESLint
apps/webapp/app/v3/taskEventStore.server.tsOops! Something went wrong! :( ESLint: 8.45.0 ESLint couldn't find the config "custom" to extend from. Please check that the name of the config is correct. The config "custom" was referenced from the config file in "/.eslintrc.js". If you still have problems, please stop by https://eslint.org/chat/help to chat with the team. ✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (2)
apps/webapp/app/v3/eventRepository.server.ts (1)
231-285
:⚠️ Potential issueSynchronize partial events resolution
WhencompleteEvent
queries incomplete events and then inserts a final record, a race condition could occur if multiple requests handle the same span in parallel. Consider adding an upsert or a guard to avoid duplicate finalization.+ // Suggestion: Add a check to ensure that only one thread can finalize a span at a time + // through a transactional or upsert approach.apps/webapp/app/v3/services/cancelTaskRun.server.ts (1)
87-104
: 🛠️ Refactor suggestionDeduplicate event cancellation logic.
This event cancellation logic is duplicated with
cancelAttempt.server.ts
. Consider extracting it to a shared service.Create a new shared service
CancelEventsService
that bothCancelTaskRunService
andCancelAttemptService
can use:// cancelEvents.server.ts export class CancelEventsService extends BaseService { public async call( taskRun: { friendlyId: string; createdAt: Date; completedAt: Date | null }, cancelledAt: Date, reason: string ) { const inProgressEvents = await eventRepository.queryIncompleteEvents( getTaskEventStoreTableForRun(taskRun), { runId: taskRun.friendlyId }, taskRun.createdAt, taskRun.completedAt ?? undefined ); logger.debug("Cancelling in-progress events", { inProgressEvents: inProgressEvents.map((event) => event.id), }); await Promise.all( inProgressEvents.map((event) => eventRepository.cancelEvent(event, cancelledAt, reason) ) ); } }
🧹 Nitpick comments (15)
apps/webapp/app/v3/services/triggerTask.server.ts (2)
398-398
: Consider adding error handling forgetTaskEventStore()
.The
getTaskEventStore()
call could potentially fail. Consider wrapping it in a try-catch block or handling potential errors to ensure graceful degradation.- taskEventStore: getTaskEventStore(), + taskEventStore: await this.#handleTaskEventStore(),Add this private method to handle the task event store:
async #handleTaskEventStore() { try { return getTaskEventStore(); } catch (error) { logger.error("Failed to get task event store", { error }); return undefined; } }
62-649
: Consider breaking down thecall
method into smaller, focused methods.The
call
method is quite long (587 lines) and handles multiple concerns. Consider extracting the following responsibilities into separate private methods:
- Validation logic
- Task run creation
- Queue management
- Dependency handling
This would improve maintainability, testability, and readability.
Example refactor for validation logic:
private async validateTaskTrigger( taskId: string, environment: AuthenticatedEnvironment, body: TriggerTaskRequestBody, options: TriggerTaskServiceOptions ): Promise<void> { // Extract validation logic here if (environment.type !== "DEVELOPMENT" && !options.skipChecks) { const result = await getEntitlement(environment.organizationId); if (result && result.hasAccess === false) { throw new OutOfEntitlementError(); } } // ... other validation logic }🧰 Tools
🪛 Biome (1.9.4)
[error] 254-255: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
apps/webapp/app/v3/taskEventStore.server.ts (3)
1-35
: Ensure clear distinction between standard and partitioned usage
Overall, the definitions and exported helpers for determining which table to use look good. Consider documenting the scenarios when each table is selected to prevent confusion among contributors about why or how the environment variable toggles partitioning.
72-115
: Revisit readReplica usage and date-range conditions
When querying partitioned tables, reads are routed to thereadReplica
. For systems with eventual consistency, there could be a small time delay before data is visible. Confirm whether consistency requirements are relaxed enough to allow read operations on the replica, especially when date filtering might boundary-include/exclude rows.Do you want help writing a verification script to confirm the replication lag or indexing overhead for these time-based queries?
117-176
: Check for potential large result sets in findTraceEvents
The raw SQL query withLIMIT ${env.MAXIMUM_TRACE_SUMMARY_VIEW_COUNT}
helps cap the result size, but iftraceId
references a high-traffic process, you could still have large partial sets to process. Consider implementing pagination or streaming if you foresee extremely large trace detail queries.apps/webapp/app/v3/eventRepository.server.ts (3)
361-421
: Clarify logic for “allowCompleteDuplicate”
ThequeryIncompleteEvents
method filters out events that have a completed duplicate. This logic is beneficial, but it might be unclear to new contributors. Consider adding a brief comment explaining how “allowCompleteDuplicate” changes the filtering outcome.
423-646
: Potential performance constraints in large trace retrieval
getTraceSummary
,getRunEvents
, andgetSpan
combine to build robust trace overviews. If used for extremely large runs, performance might degrade. Evaluate whether you need pagination or summarization for extremely large traces or runs beyond the current limit approach.
647-787
: Prevent infinite ancestor walks
#createSpanFromEvent
and#walkSpanAncestors
rely on alevel >= 8
safeguard to stop indefinite recursion. Confirm that 8 is the correct threshold or consider a configurable approach for more elaborate nested calls.apps/webapp/app/routes/resources.runs.$runParam.logs.download.ts (1)
35-40
: Add error handling for undefined dates.While the implementation is correct, it would be beneficial to add validation for the date parameters to ensure they are defined and in the correct order.
const runEvents = await eventRepository.getRunEvents( getTaskEventStoreTableForRun(run), run.friendlyId, + // Ensure createdAt is defined + run.createdAt ?? new Date(0), + // Ensure completedAt is after createdAt + run.completedAt && run.completedAt > run.createdAt ? run.completedAt : undefined - run.createdAt, - run.completedAt ?? undefined );apps/webapp/app/v3/services/cancelAttempt.server.ts (1)
76-93
: Consider extracting event cancellation logic to a separate method.The event querying and cancellation logic could be extracted into a dedicated method for better organization and reusability.
+ private async cancelInProgressEvents( + taskRun: { friendlyId: string; createdAt: Date; completedAt: Date | null }, + cancelledAt: Date, + reason: string + ) { + const inProgressEvents = await eventRepository.queryIncompleteEvents( + getTaskEventStoreTableForRun(taskRun), + { runId: taskRun.friendlyId }, + taskRun.createdAt, + taskRun.completedAt ?? undefined + ); + + logger.debug("Cancelling in-progress events", { + inProgressEvents: inProgressEvents.map((event) => event.id), + }); + + await Promise.all( + inProgressEvents.map((event) => + eventRepository.cancelEvent(event, cancelledAt, reason) + ) + ); + }apps/webapp/app/presenters/v3/RunPresenter.server.ts (1)
111-116
: Consider adding type safety for taskEventStore field.The
taskEventStore
field should be properly typed to ensure type safety throughout the application.+type TaskEventStore = { + taskEventStore: string; + createdAt: Date; + completedAt: Date | null; +}; + const traceSummary = await eventRepository.getTraceSummary( getTaskEventStoreTableForRun(run as TaskEventStore), run.traceId, run.createdAt, run.completedAt ?? undefined );apps/webapp/app/env.server.ts (2)
485-486
: Add documentation for the new environment variables.Consider adding JSDoc comments to explain:
- The purpose and impact of enabling task event partitioning
- The significance of the partitioned window duration
- Any performance implications or recommendations
+/** + * Controls whether task events are stored in partitioned tables. + * @default "0" (disabled) + */ TASK_EVENT_PARTITIONING_ENABLED: z.string().default("0"), +/** + * The time window in seconds for each task event partition. + * @default 60 (1 minute) + */ TASK_EVENT_PARTITIONED_WINDOW_IN_SECONDS: z.coerce.number().int().default(60), // 1 minute
485-485
: Consider using boolean coercion for the partitioning flag.For better type safety and consistency with boolean values, consider using
z.coerce.boolean()
instead of string.-TASK_EVENT_PARTITIONING_ENABLED: z.string().default("0"), +TASK_EVENT_PARTITIONING_ENABLED: z.coerce.boolean().default(false),apps/webapp/app/presenters/v3/SpanPresenter.server.ts (1)
75-75
: Consider removing unused taskEventStore selection.The
taskEventStore
field is selected in the query but not used in the subsequent code. Consider removing it if it's not needed for the current implementation.internal-packages/database/prisma/migrations/20250212053026_create_task_event_partitioned_table/migration.sql (1)
1-54
: Well-designed partitioned table for task events.The table structure is well thought out:
- Range partitioning on
createdAt
is perfect for time-series data- Composite primary key
(id, createdAt)
supports partitioning- Fields comprehensively cover all event tracking needs
Consider creating partition maintenance procedures to:
- Automatically create new partitions ahead of time
- Archive/drop old partitions for data retention
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
apps/webapp/app/components/runs/v3/RunInspector.tsx
(0 hunks)apps/webapp/app/components/runs/v3/SpanInspector.tsx
(2 hunks)apps/webapp/app/env.server.ts
(1 hunks)apps/webapp/app/presenters/v3/RunPresenter.server.ts
(3 hunks)apps/webapp/app/presenters/v3/SpanPresenter.server.ts
(5 hunks)apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.v3.$projectParam.traces.$traceId.spans.$spanId/route.tsx
(0 hunks)apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.v3.$projectParam.runs.$runParam.spans.$spanParam/route.tsx
(0 hunks)apps/webapp/app/routes/resources.runs.$runParam.logs.download.ts
(2 hunks)apps/webapp/app/utils/pathBuilder.ts
(0 hunks)apps/webapp/app/utils/taskEvent.ts
(0 hunks)apps/webapp/app/v3/eventRepository.server.ts
(17 hunks)apps/webapp/app/v3/services/cancelAttempt.server.ts
(2 hunks)apps/webapp/app/v3/services/cancelTaskRun.server.ts
(2 hunks)apps/webapp/app/v3/services/completeAttempt.server.ts
(5 hunks)apps/webapp/app/v3/services/crashTaskRun.server.ts
(2 hunks)apps/webapp/app/v3/services/expireEnqueuedRun.server.ts
(2 hunks)apps/webapp/app/v3/services/triggerTask.server.ts
(2 hunks)apps/webapp/app/v3/taskEventStore.server.ts
(1 hunks)internal-packages/database/prisma/migrations/20250212053026_create_task_event_partitioned_table/migration.sql
(1 hunks)internal-packages/database/prisma/migrations/20250212075957_add_task_event_store_to_task_run/migration.sql
(1 hunks)internal-packages/database/prisma/schema.prisma
(2 hunks)
💤 Files with no reviewable changes (5)
- apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.v3.$projectParam.traces.$traceId.spans.$spanId/route.tsx
- apps/webapp/app/utils/taskEvent.ts
- apps/webapp/app/utils/pathBuilder.ts
- apps/webapp/app/components/runs/v3/RunInspector.tsx
- apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.v3.$projectParam.runs.$runParam.spans.$spanParam/route.tsx
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: units / 🧪 Unit Tests
- GitHub Check: typecheck / typecheck
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (16)
apps/webapp/app/v3/services/triggerTask.server.ts (1)
32-32
: LGTM!The import statement for
getTaskEventStore
is correctly placed and follows the project's import style.apps/webapp/app/components/runs/v3/SpanInspector.tsx (3)
1-1
: LGTM! Import changes align with component requirements.The import changes correctly support the component's functionality:
- Core utilities for duration formatting
- Spinner component for pending state visualization
- Simplified path utilities after link handling removal
Also applies to: 8-8, 20-20
219-264
: LGTM! SpanTimeline improvements enhance user experience.The SpanTimeline component effectively:
- Provides clear visual feedback for pending state using the Spinner
- Maintains consistent time formatting
- Handles all states (pending, complete, error) appropriately
25-210
: Verify the impact of link removal on user navigation.While the component's core functionality remains intact, the removal of link handling might affect user navigation patterns. Please ensure that:
- Users can still effectively navigate between related spans
- The removal of links doesn't break any critical user journeys
- Alternative navigation paths are available where needed
Let's verify the impact with this script:
✅ Verification successful
🏁 Script executed:
#!/bin/bash # Description: Check for any remaining link-related code or dependencies # Search for any remaining link-related patterns echo "Checking for remaining link patterns..." rg -i "link" --type tsx --type ts # Search for navigation patterns to ensure alternatives exist echo "Checking navigation patterns..." rg -i "navigate|route|path" --type tsx --type tsLength of output: 302
I'll now run an additional script to search for navigation-related components—specifically looking for the usage of the
TextLink
component and thev3RunsPath
function—to confirm that navigation remains in place despite the removal of legacy link-handling code.
🏁 Script executed:
[run_scripts] #!/bin/bash echo "Searching for 'TextLink' usage..." rg -i "TextLink" echo "Searching for 'v3RunsPath' usage..." rg -i "v3RunsPath"Length of output: 16320
Navigation remains intact despite link removal changes.
Our analysis confirmed that the
TextLink
component and thev3RunsPath
function are actively used in multiple areas—including in theSpanInspector
component via the "Task" property—indicating that users can still navigate effectively. The current implementation supports navigation with updated patterns, and no critical issues were found.apps/webapp/app/v3/taskEventStore.server.ts (1)
37-60
: Validate inputs before inserting
Thecreate
andcreateMany
methods rely on trusted data. Consider adding assertion checks or validations for critical fields (e.g., ensuringdata
or array length is not empty) before writing to the database to reduce the likelihood of invalid records.apps/webapp/app/v3/eventRepository.server.ts (2)
37-106
: Partitioning checks for partial events
Introducing theTaskEventStore
property and the partitioning flag is a good approach. As you rely on the environment variable for toggling, confirm that higher-level code (such as migrations or data backfill routines) aligns with whether partitioning is expected.
1078-1096
: Ensure consistent usage of taskEventStoreTable
Theprivate get taskEventStoreTable()
fallback and subsequent use increateMany
is a neat approach. Double-check references in other code paths to confirm consistent usage across your entire codebase so the partitioning logic is never bypassed inadvertently.apps/webapp/app/v3/services/expireEnqueuedRun.server.ts (2)
7-7
: Confirm import location
The import ofgetTaskEventStoreTableForRun
appears correct. If you rename or refactor the store logic in the future, ensure the path remains consistent across services likeExpireEnqueuedRunService
orCancelTaskRunService
.
81-103
: Complete event logic
UsingeventRepository.completeEvent
withgetTaskEventStoreTableForRun(run)
is good for correctness under partitioned scenarios. However, ensure that ifrun.createdAt
is far in the past, no relevant events get excluded by your date constraints in event repository queries.apps/webapp/app/v3/services/crashTaskRun.server.ts (1)
123-131
: LGTM! Proper integration of task event store partitioning.The changes correctly integrate the new task event store functionality while maintaining existing error handling and logging. The addition of timestamps provides better context for event queries.
apps/webapp/app/presenters/v3/SpanPresenter.server.ts (1)
210-216
: LGTM! Consistent implementation of task event store partitioning.The changes correctly integrate the new task event store functionality while maintaining existing error handling and data transformation logic.
Also applies to: 366-372
apps/webapp/app/v3/services/completeAttempt.server.ts (1)
168-186
: LGTM! Comprehensive integration of task event store partitioning.The changes correctly integrate the new task event store functionality across all paths (success, failure, and system failure), while maintaining:
- Existing error handling
- Retry logic
- Event completion semantics
Also applies to: 317-337, 379-386, 414-435
internal-packages/database/prisma/migrations/20250212075957_add_task_event_store_to_task_run/migration.sql (1)
1-5
: LGTM! The migration safely adds the task event store column.The migration correctly adds the
taskEventStore
column with a default value, ensuring backward compatibility with existing rows.internal-packages/database/prisma/migrations/20250212053026_create_task_event_partitioned_table/migration.sql (1)
56-63
: Appropriate indexes for common query patterns.The indexes on
traceId
,spanId
, andrunId
will support efficient:
- Trace/span lookups for distributed tracing
- Run-specific event queries
✅ Verification successful
🏁 Script executed:
#!/bin/bash # Verify if similar indexes exist on the original TaskEvent table for consistency rg -A 5 'CREATE INDEX.*TaskEvent.*(traceId|spanId|runId)'Length of output: 1618
Index Consistency Verified
The migration file for the partitioned table correctly creates indexes ontraceId
,spanId
, andrunId
, matching the indexes previously established in theTaskEvent
table migrations. This ensures that query patterns for distributed tracing and run-specific events are consistently optimized.
- The partitioned table (
TaskEventPartitioned
) defines indexes analogous to those on the original table (TaskEvent
).- Shell script results confirm that index creation for these columns exists in both places.
internal-packages/database/prisma/schema.prisma (2)
1732-1732
: LGTM! TaskRun model updated with event store tracking.The
taskEventStore
field is correctly added with a default value of "taskEvent", matching the migration.
2710-2805
: Well-structured TaskEventPartitioned model.The model correctly implements the partitioned table with:
- Composite primary key using
@@id([id, createdAt])
- Matching indexes from the migration
- Field definitions consistent with the original TaskEvent model
Let's verify the field consistency between TaskEvent and TaskEventPartitioned models:
✅ Verification successful
🏁 Script executed:
#!/bin/bash # Extract and compare field definitions between TaskEvent and TaskEventPartitioned ast-grep --pattern $'model TaskEvent { $$$ }' ast-grep --pattern $'model TaskEventPartitioned { $$$ }'Length of output: 108
I'll generate a revised script to extract and display the full definitions of both models using
rg
. This should help us manually verify the field consistency between theTaskEvent
andTaskEventPartitioned
models.Please run the following shell script:
Once we have the output from these commands, we can compare the field definitions to confirm consistency.
🏁 Script executed:
[run_scripts] #!/bin/bash echo "==== TaskEvent model definition ====" rg "model TaskEvent {" -A 200 internal-packages/database/prisma/schema.prisma echo "==== TaskEventPartitioned model definition ====" rg "model TaskEventPartitioned {" -A 200 internal-packages/database/prisma/schema.prismaLength of output: 698
Let's try a new script using fixed-string search with
rg -F
to avoid regex errors. Please run the following shell script to extract the full definitions of both models:This will help us directly verify field consistency between the two models.
🏁 Script executed:
[run_scripts] #!/bin/bash echo "==== TaskEvent model definition ====" rg -F "model TaskEvent {" -A 300 internal-packages/database/prisma/schema.prisma echo "==== TaskEventPartitioned model definition ====" rg -F "model TaskEventPartitioned {" -A 300 internal-packages/database/prisma/schema.prismaLength of output: 11720
Field consistency verified between TaskEvent and TaskEventPartitioned
The field definitions align exactly between the two models except for the intentional change in the primary key—in TaskEvent, a single field is marked as the primary key, while in TaskEventPartitioned a composite primary key (id and createdAt) is used to support partitioning. This confirms that the partitioned model correctly mirrors the original TaskEvent model’s structure with the planned adjustments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
apps/webapp/app/v3/taskEventStore.server.ts (5)
25-31
: Consider handling unknown or malformedtaskEventStore
values more explicitly.
Ifrun.taskEventStore
contains an unexpected string, it silently defaults to"taskEvent"
. Consider validating the field or logging a warning to flag data inconsistencies.
33-35
: Ensure environment variable usage is consistent with boolean semantics.
Currently, partitioning is enabled by checking ifTASK_EVENT_PARTITIONING_ENABLED === "1"
. Consider using a more standard boolean environment variable (true/false) for clarity.
43-49
: Optional: Add error handling or logging in thecreate
method.
If the database call fails, the error is not caught or logged. Consider wrapping this call in a try/catch or letting the caller handle the error with more context.
54-60
: Optional: Add error handling or logging in thecreateMany
method.
Similar tocreate
, consider adding a mechanism to log or handle failures when inserting multiple records.
62-115
: Update code comment to match the actual time offset logic.
The comment at line 83 references adding “1 minute” when computing theendCreatedAt
boundary. However, the code usesenv.TASK_EVENT_PARTITIONED_WINDOW_IN_SECONDS
, which might exceed 60 seconds. Consider updating the comment for clarity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apps/webapp/app/v3/taskEventStore.server.ts
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: typecheck / typecheck
- GitHub Check: units / 🧪 Unit Tests
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (1)
apps/webapp/app/v3/taskEventStore.server.ts (1)
6-23
: Use caution with partial and picked types for future schema extensions.
By Omit-ing and Pick-ing fields, additions to the underlying schema may need updates here. Ensure you maintain these types as the schema evolves.
async findTraceEvents( | ||
table: TaskEventStoreTable, | ||
traceId: string, | ||
startCreatedAt: Date, | ||
endCreatedAt?: Date | ||
) { | ||
if (table === "taskEventPartitioned") { | ||
return await this.readReplica.$queryRaw<TraceEvent[]>` | ||
SELECT | ||
"spanId", | ||
"parentId", | ||
"runId", | ||
"idempotencyKey", | ||
LEFT(message, 256) as message, | ||
style, | ||
"startTime", | ||
duration, | ||
"isError", | ||
"isPartial", | ||
"isCancelled", | ||
level, | ||
events, | ||
"environmentType" | ||
FROM "TaskEventPartitioned" | ||
WHERE | ||
"traceId" = ${traceId} | ||
AND "createdAt" >= ${startCreatedAt.toISOString()}::timestamp | ||
AND "createdAt" < ${(endCreatedAt | ||
? new Date(endCreatedAt.getTime() + env.TASK_EVENT_PARTITIONED_WINDOW_IN_SECONDS * 1000) | ||
: new Date() | ||
).toISOString()}::timestamp | ||
ORDER BY "startTime" ASC | ||
LIMIT ${env.MAXIMUM_TRACE_SUMMARY_VIEW_COUNT} | ||
`; | ||
} else { | ||
return await this.readReplica.$queryRaw<TraceEvent[]>` | ||
SELECT | ||
id, | ||
"spanId", | ||
"parentId", | ||
"runId", | ||
"idempotencyKey", | ||
LEFT(message, 256) as message, | ||
style, | ||
"startTime", | ||
duration, | ||
"isError", | ||
"isPartial", | ||
"isCancelled", | ||
level, | ||
events, | ||
"environmentType" | ||
FROM "TaskEvent" | ||
WHERE "traceId" = ${traceId} | ||
ORDER BY "startTime" ASC | ||
LIMIT ${env.MAXIMUM_TRACE_SUMMARY_VIEW_COUNT} | ||
`; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Align selected columns in partitioned vs. unpartitioned queries.
In partitioned mode, the id
column is omitted, while the unpartitioned query includes it, even though TraceEvent
type excludes id
. This discrepancy could cause confusion. Consider returning identical columns or updating the type.
… new runs are created (#1696) * Create new partitioned TaskEvent table, and switch to it gradually as new runs are created * Add env var for partition window in seconds * Make startCreatedAt required in task event store
* Uses image cards for the frameworks * Removes old snippets * New AI agents side menu section * WIP adding new ai agent pages * Better overview page * More copy added to the agent example pages * Copy improvements * Removes “Creating a project” page and side menu section * Fixes broken links * Updates to the latest Mintlify version, fixes issues, changes theme * Adds descriptions to the main dropdown menu items * Reformatted Introduction docs ‘landing page’ * Retry heartbeat timeouts by putting back in the queue (#1689) * If there’s a heartbeat error and no attempts we put it back in the queue to try again * When nacking, return whether it was put back in the queue or not * Try and nack, if it fails then fail the run * Consolidated switch statement * Fail executing/retrying runs * OOM retrying on larger machines (#1691) * OOM retrying on larger machines * Create forty-windows-shop.md * Update forty-windows-shop.md * Only retry again if the machine is different from the original * Kubernetes OOMs appear as non-zero sigkills, adding support for treating these as OOMs * Complete the original attempt span if retrying due to an OOM * Revert "Complete the original attempt span if retrying due to an OOM" This reverts commit 5f652c6. * chore: Update version for release (#1666) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Release 3.3.14 * Set machine when triggering docs * Batch queue runs that are waiting for deploy (#1693) * Detect ffmpeg OOM errors, added manual OutOfMemoryError (#1694) * Detect ffmpeg OOM errors, added manual OutOfMemoryError * Create eighty-spies-knock.md * Improved the machines docs, including the new OutOfMemoryError * chore: Update version for release (#1695) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Release 3.3.15 * Create new partitioned TaskEvent table, and switch to it gradually as new runs are created (#1696) * Create new partitioned TaskEvent table, and switch to it gradually as new runs are created * Add env var for partition window in seconds * Make startCreatedAt required in task event store * Don't create an attempt if the run is final, batchTriggerAndWait bad continue fix (#1698) * WIP fix for ResumeAttemptService selecting the wrong attempt (which has no error or output) * Don’t create an attempt if the run is already in a final status * Don’t get all the columns for the query. Improved the logging. * Added a log to the batch example * Filter out the undefined values * Fix missing logs on child runs by using the root task run createdAt if it exists (#1697) * Provider changes to support image cache (#1700) * add env var for additional pull secrets * make static images configurable * optional image prefixes * optional labels with sample rates * add missing core paths * remove excessive logs * Fix run container exits after OOM retries (#1701) * remove unused imports * tell run to exit before force requeue * handle exit for case where we already retried after oom * improve retry span and add machine props * don't try to exit run in dev * Upgrade local dev to use electric beta.15 (#1699) * Text fixes * Removed pnpm files --------- Co-authored-by: Matt Aitken <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Eric Allam <[email protected]> Co-authored-by: nicktrn <[email protected]>
Summary by CodeRabbit