Create new partitioned TaskEvent table, and switch to it gradually as new runs are created #1696

ericallam · 2025-02-12T08:26:30Z

Summary by CodeRabbit

New Features
- Introduced enhanced configuration options for task event partitioning, enabling optimized event management and improved performance.
- Added new methods for managing task events, improving data retrieval and storage capabilities.
Refactor
- Streamlined run and span inspection interfaces by removing extraneous link displays for a cleaner view.
- Updated navigation and data retrieval workflows to deliver more accurate task and trace information.

… new runs are created

changeset-bot · 2025-02-12T08:26:35Z

⚠️ No Changeset found

Latest commit: e353738

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2025-02-12T08:34:35Z

Walkthrough

This pull request removes the span link rendering functionality from several components by deleting the SpanLinkElement function and associated conditional logic in components like RunInspector.tsx, SpanInspector.tsx, and related route files. It also revises several presenters and services to incorporate a new task event store mechanism by using the getTaskEventStoreTableForRun function and additional time-based parameters. Furthermore, new environment configuration options are added, and database migrations/schema updates are introduced to support task event partitioning.

Changes

File(s)	Change Summary
`apps/webapp/app/components/runs/v3/RunInspector.tsx`, `apps/webapp/app/components/runs/v3/SpanInspector.tsx`, `apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.v3.$projectParam.runs.$runParam.spans.$spanParam/route.tsx`	Removed code for displaying span links and deleted the `SpanLinkElement` function, eliminating link handling in inspector components and routes.
`apps/webapp/app/env.server.ts`	Added new environment properties: `TASK_EVENT_PARTITIONING_ENABLED` (string, default `"0"`) and `TASK_EVENT_PARTITIONED_WINDOW_IN_SECONDS` (integer, default `60`).
`apps/webapp/app/presenters/v3/RunPresenter.server.ts`, `apps/webapp/app/presenters/v3/SpanPresenter.server.ts`	Modified query logic to use `getTaskEventStoreTableForRun` and include additional parameters (`createdAt`, `completedAt`) for retrieving trace summaries and span data.
`apps/webapp/app/utils/pathBuilder.ts`, `apps/webapp/app/utils/taskEvent.ts`	Removed the `v3TraceSpanPath` function and eliminated the `recordId` property from span objects in the task event preparation functions.
`apps/webapp/app/v3/eventRepository.server.ts`, `apps/webapp/app/v3/services/cancelAttempt.server.ts`, `apps/webapp/app/v3/services/cancelTaskRun.server.ts`, `apps/webapp/app/v3/services/completeAttempt.server.ts`, `apps/webapp/app/v3/services/crashTaskRun.server.ts`, `apps/webapp/app/v3/services/expireEnqueuedRun.server.ts`, `apps/webapp/app/v3/services/triggerTask.server.ts`	Updated method signatures and calls to pass task run context by incorporating `getTaskEventStoreTableForRun` along with run timestamps. This refactors event querying, completion, and task run handling across multiple services.
`apps/webapp/app/v3/taskEventStore.server.ts`	New file added to manage task events. Introduces types (`CommonTaskEvent`, `TraceEvent`, `TaskEventStoreTable`), helper functions (`getTaskEventStoreTableForRun`, `getTaskEventStore`), and the `TaskEventStore` class with methods to create and query task events.
Database migrations and schema	Added new migration script to create the partitioned table `TaskEventPartitioned` with related indexes and modified the `TaskRun` table by adding the `taskEventStore` column. The Prisma schema is updated with a new model `TaskEventPartitioned` and corresponding field in `TaskRun`.

Sequence Diagram(s)

sequenceDiagram
    participant Service as Service/Presenter
    participant Resolver as getTaskEventStoreTableForRun
    participant Repo as EventRepository
    participant DB as Database

    Service->>Resolver: Determine event store table for run
    Resolver-->>Service: Return table name
    Service->>Repo: Call updated query/completion with (table, runId, createdAt, completedAt)
    Repo->>DB: Execute query/command on the specified table
    DB-->>Repo: Return events/result
    Repo-->>Service: Return processed data

Possibly related PRs

New onboarding question #1404: Removed the SpanLinkElement function and associated link handling logic in inspector components, mirroring changes in this PR.
Move run ttl and delays from graphile to redis worker #1672: Eliminated span link rendering from route files, aligning closely with the removal of link functionality in the current changes.
docs: realtime, react hooks, and frontend usage #1434: Addressed the removal of the SpanLinkElement function in route files, showing a consistent approach to deprecating span link rendering across components.

Poem

I’m a rabbit hopping through the code,
Leaving link traces in the dust mode.
Span elements vanished in a blink,
Task events now have more time to think.
With schema and queries primed anew,
Celebrate the changes – for code, we grew! 🐰✨

Warning

There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

apps/webapp/app/v3/taskEventStore.server.ts

Oops! Something went wrong! :(

ESLint: 8.45.0

ESLint couldn't find the config "custom" to extend from. Please check that the name of the config is correct.

The config "custom" was referenced from the config file in "/.eslintrc.js".

If you still have problems, please stop by https://eslint.org/chat/help to chat with the team.

✨ Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🔭 Outside diff range comments (2)

apps/webapp/app/v3/eventRepository.server.ts (1)
231-285: ⚠️ Potential issue

Synchronize partial events resolution
When completeEvent queries incomplete events and then inserts a final record, a race condition could occur if multiple requests handle the same span in parallel. Consider adding an upsert or a guard to avoid duplicate finalization.
+ // Suggestion: Add a check to ensure that only one thread can finalize a span at a time
+ // through a transactional or upsert approach.
apps/webapp/app/v3/services/cancelTaskRun.server.ts (1)
87-104: 🛠️ Refactor suggestion

Deduplicate event cancellation logic.

This event cancellation logic is duplicated with cancelAttempt.server.ts. Consider extracting it to a shared service.

Create a new shared service CancelEventsService that both CancelTaskRunService and CancelAttemptService can use:
// cancelEvents.server.ts
export class CancelEventsService extends BaseService {
  public async call(
    taskRun: { friendlyId: string; createdAt: Date; completedAt: Date | null },
    cancelledAt: Date,
    reason: string
  ) {
    const inProgressEvents = await eventRepository.queryIncompleteEvents(
      getTaskEventStoreTableForRun(taskRun),
      { runId: taskRun.friendlyId },
      taskRun.createdAt,
      taskRun.completedAt ?? undefined
    );

    logger.debug("Cancelling in-progress events", {
      inProgressEvents: inProgressEvents.map((event) => event.id),
    });

    await Promise.all(
      inProgressEvents.map((event) =>
        eventRepository.cancelEvent(event, cancelledAt, reason)
      )
    );
  }
}

🧹 Nitpick comments (15)

apps/webapp/app/v3/services/triggerTask.server.ts (2)
398-398: Consider adding error handling for getTaskEventStore().

The getTaskEventStore() call could potentially fail. Consider wrapping it in a try-catch block or handling potential errors to ensure graceful degradation.
-                    taskEventStore: getTaskEventStore(),
+                    taskEventStore: await this.#handleTaskEventStore(),
Add this private method to handle the task event store:
async #handleTaskEventStore() {
  try {
    return getTaskEventStore();
  } catch (error) {
    logger.error("Failed to get task event store", { error });
    return undefined;
  }
}
62-649: Consider breaking down the call method into smaller, focused methods.

The call method is quite long (587 lines) and handles multiple concerns. Consider extracting the following responsibilities into separate private methods:

Validation logic

Task run creation

Queue management

Dependency handling

This would improve maintainability, testability, and readability.

Example refactor for validation logic:
private async validateTaskTrigger(
  taskId: string,
  environment: AuthenticatedEnvironment,
  body: TriggerTaskRequestBody,
  options: TriggerTaskServiceOptions
): Promise<void> {
  // Extract validation logic here
  if (environment.type !== "DEVELOPMENT" && !options.skipChecks) {
    const result = await getEntitlement(environment.organizationId);
    if (result && result.hasAccess === false) {
      throw new OutOfEntitlementError();
    }
  }
  // ... other validation logic
}
🧰 Tools

🪛 Biome (1.9.4)

[error] 254-255: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)
apps/webapp/app/v3/taskEventStore.server.ts (3)

1-35: Ensure clear distinction between standard and partitioned usage
Overall, the definitions and exported helpers for determining which table to use look good. Consider documenting the scenarios when each table is selected to prevent confusion among contributors about why or how the environment variable toggles partitioning.

72-115: Revisit readReplica usage and date-range conditions
When querying partitioned tables, reads are routed to the readReplica. For systems with eventual consistency, there could be a small time delay before data is visible. Confirm whether consistency requirements are relaxed enough to allow read operations on the replica, especially when date filtering might boundary-include/exclude rows.

Do you want help writing a verification script to confirm the replication lag or indexing overhead for these time-based queries?

117-176: Check for potential large result sets in findTraceEvents
The raw SQL query with LIMIT ${env.MAXIMUM_TRACE_SUMMARY_VIEW_COUNT} helps cap the result size, but if traceId references a high-traffic process, you could still have large partial sets to process. Consider implementing pagination or streaming if you foresee extremely large trace detail queries.

apps/webapp/app/v3/eventRepository.server.ts (3)

361-421: Clarify logic for “allowCompleteDuplicate”
The queryIncompleteEvents method filters out events that have a completed duplicate. This logic is beneficial, but it might be unclear to new contributors. Consider adding a brief comment explaining how “allowCompleteDuplicate” changes the filtering outcome.

423-646: Potential performance constraints in large trace retrieval
getTraceSummary, getRunEvents, and getSpan combine to build robust trace overviews. If used for extremely large runs, performance might degrade. Evaluate whether you need pagination or summarization for extremely large traces or runs beyond the current limit approach.

647-787: Prevent infinite ancestor walks
#createSpanFromEvent and #walkSpanAncestors rely on a level >= 8 safeguard to stop indefinite recursion. Confirm that 8 is the correct threshold or consider a configurable approach for more elaborate nested calls.
apps/webapp/app/routes/resources.runs.$runParam.logs.download.ts (1)
35-40: Add error handling for undefined dates.

While the implementation is correct, it would be beneficial to add validation for the date parameters to ensure they are defined and in the correct order.
   const runEvents = await eventRepository.getRunEvents(
     getTaskEventStoreTableForRun(run),
     run.friendlyId,
+    // Ensure createdAt is defined
+    run.createdAt ?? new Date(0),
+    // Ensure completedAt is after createdAt
+    run.completedAt && run.completedAt > run.createdAt ? run.completedAt : undefined
-    run.createdAt,
-    run.completedAt ?? undefined
   );
apps/webapp/app/v3/services/cancelAttempt.server.ts (1)
76-93: Consider extracting event cancellation logic to a separate method.

The event querying and cancellation logic could be extracted into a dedicated method for better organization and reusability.
+  private async cancelInProgressEvents(
+    taskRun: { friendlyId: string; createdAt: Date; completedAt: Date | null },
+    cancelledAt: Date,
+    reason: string
+  ) {
+    const inProgressEvents = await eventRepository.queryIncompleteEvents(
+      getTaskEventStoreTableForRun(taskRun),
+      { runId: taskRun.friendlyId },
+      taskRun.createdAt,
+      taskRun.completedAt ?? undefined
+    );
+
+    logger.debug("Cancelling in-progress events", {
+      inProgressEvents: inProgressEvents.map((event) => event.id),
+    });
+
+    await Promise.all(
+      inProgressEvents.map((event) =>
+        eventRepository.cancelEvent(event, cancelledAt, reason)
+      )
+    );
+  }
apps/webapp/app/presenters/v3/RunPresenter.server.ts (1)
111-116: Consider adding type safety for taskEventStore field.

The taskEventStore field should be properly typed to ensure type safety throughout the application.
+type TaskEventStore = {
+  taskEventStore: string;
+  createdAt: Date;
+  completedAt: Date | null;
+};
+
 const traceSummary = await eventRepository.getTraceSummary(
   getTaskEventStoreTableForRun(run as TaskEventStore),
   run.traceId,
   run.createdAt,
   run.completedAt ?? undefined
 );
apps/webapp/app/env.server.ts (2)
485-486: Add documentation for the new environment variables.

Consider adding JSDoc comments to explain:

The purpose and impact of enabling task event partitioning

The significance of the partitioned window duration

Any performance implications or recommendations
+/**
+ * Controls whether task events are stored in partitioned tables.
+ * @default "0" (disabled)
+ */
 TASK_EVENT_PARTITIONING_ENABLED: z.string().default("0"),
+/**
+ * The time window in seconds for each task event partition.
+ * @default 60 (1 minute)
+ */
 TASK_EVENT_PARTITIONED_WINDOW_IN_SECONDS: z.coerce.number().int().default(60), // 1 minute
485-485: Consider using boolean coercion for the partitioning flag.

For better type safety and consistency with boolean values, consider using z.coerce.boolean() instead of string.
-TASK_EVENT_PARTITIONING_ENABLED: z.string().default("0"),
+TASK_EVENT_PARTITIONING_ENABLED: z.coerce.boolean().default(false),
apps/webapp/app/presenters/v3/SpanPresenter.server.ts (1)

75-75: Consider removing unused taskEventStore selection.

The taskEventStore field is selected in the query but not used in the subsequent code. Consider removing it if it's not needed for the current implementation.

internal-packages/database/prisma/migrations/20250212053026_create_task_event_partitioned_table/migration.sql (1)

1-54: Well-designed partitioned table for task events.

The table structure is well thought out:

Range partitioning on createdAt is perfect for time-series data

Composite primary key (id, createdAt) supports partitioning

Fields comprehensively cover all event tracking needs

Consider creating partition maintenance procedures to:

Automatically create new partitions ahead of time

Archive/drop old partitions for data retention

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0726af and b734ee2.

📒 Files selected for processing (21)

apps/webapp/app/components/runs/v3/RunInspector.tsx (0 hunks)
apps/webapp/app/components/runs/v3/SpanInspector.tsx (2 hunks)
apps/webapp/app/env.server.ts (1 hunks)
apps/webapp/app/presenters/v3/RunPresenter.server.ts (3 hunks)
apps/webapp/app/presenters/v3/SpanPresenter.server.ts (5 hunks)
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.v3.$projectParam.traces.$traceId.spans.$spanId/route.tsx (0 hunks)
apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.v3.$projectParam.runs.$runParam.spans.$spanParam/route.tsx (0 hunks)
apps/webapp/app/routes/resources.runs.$runParam.logs.download.ts (2 hunks)
apps/webapp/app/utils/pathBuilder.ts (0 hunks)
apps/webapp/app/utils/taskEvent.ts (0 hunks)
apps/webapp/app/v3/eventRepository.server.ts (17 hunks)
apps/webapp/app/v3/services/cancelAttempt.server.ts (2 hunks)
apps/webapp/app/v3/services/cancelTaskRun.server.ts (2 hunks)
apps/webapp/app/v3/services/completeAttempt.server.ts (5 hunks)
apps/webapp/app/v3/services/crashTaskRun.server.ts (2 hunks)
apps/webapp/app/v3/services/expireEnqueuedRun.server.ts (2 hunks)
apps/webapp/app/v3/services/triggerTask.server.ts (2 hunks)
apps/webapp/app/v3/taskEventStore.server.ts (1 hunks)
internal-packages/database/prisma/migrations/20250212053026_create_task_event_partitioned_table/migration.sql (1 hunks)
internal-packages/database/prisma/migrations/20250212075957_add_task_event_store_to_task_run/migration.sql (1 hunks)
internal-packages/database/prisma/schema.prisma (2 hunks)

💤 Files with no reviewable changes (5)

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.v3.$projectParam.traces.$traceId.spans.$spanId/route.tsx
apps/webapp/app/utils/taskEvent.ts
apps/webapp/app/utils/pathBuilder.ts
apps/webapp/app/components/runs/v3/RunInspector.tsx
apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.v3.$projectParam.runs.$runParam.spans.$spanParam/route.tsx

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
GitHub Check: units / 🧪 Unit Tests
GitHub Check: typecheck / typecheck
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
GitHub Check: Analyze (javascript-typescript)

🔇 Additional comments (16)

apps/webapp/app/v3/services/triggerTask.server.ts (1)

32-32: LGTM!

The import statement for getTaskEventStore is correctly placed and follows the project's import style.
apps/webapp/app/components/runs/v3/SpanInspector.tsx (3)

1-1: LGTM! Import changes align with component requirements.

The import changes correctly support the component's functionality:

Core utilities for duration formatting

Spinner component for pending state visualization

Simplified path utilities after link handling removal

Also applies to: 8-8, 20-20

219-264: LGTM! SpanTimeline improvements enhance user experience.

The SpanTimeline component effectively:

Provides clear visual feedback for pending state using the Spinner

Maintains consistent time formatting

Handles all states (pending, complete, error) appropriately

25-210: Verify the impact of link removal on user navigation.

While the component's core functionality remains intact, the removal of link handling might affect user navigation patterns. Please ensure that:

Users can still effectively navigate between related spans

The removal of links doesn't break any critical user journeys

Alternative navigation paths are available where needed

Let's verify the impact with this script:
✅ Verification successful

🏁 Script executed:
#!/bin/bash
# Description: Check for any remaining link-related code or dependencies

# Search for any remaining link-related patterns
echo "Checking for remaining link patterns..."
rg -i "link" --type tsx --type ts

# Search for navigation patterns to ensure alternatives exist
echo "Checking navigation patterns..."
rg -i "navigate|route|path" --type tsx --type ts
Length of output: 302

I'll now run an additional script to search for navigation-related components—specifically looking for the usage of the TextLink component and the v3RunsPath function—to confirm that navigation remains in place despite the removal of legacy link-handling code.

🏁 Script executed:
[run_scripts]
#!/bin/bash
echo "Searching for 'TextLink' usage..."
rg -i "TextLink"
echo "Searching for 'v3RunsPath' usage..."
rg -i "v3RunsPath"
Length of output: 16320

Navigation remains intact despite link removal changes.

Our analysis confirmed that the TextLink component and the v3RunsPath function are actively used in multiple areas—including in the SpanInspector component via the "Task" property—indicating that users can still navigate effectively. The current implementation supports navigation with updated patterns, and no critical issues were found.
apps/webapp/app/v3/taskEventStore.server.ts (1)

37-60: Validate inputs before inserting
The create and createMany methods rely on trusted data. Consider adding assertion checks or validations for critical fields (e.g., ensuring data or array length is not empty) before writing to the database to reduce the likelihood of invalid records.

apps/webapp/app/v3/eventRepository.server.ts (2)

37-106: Partitioning checks for partial events
Introducing the TaskEventStore property and the partitioning flag is a good approach. As you rely on the environment variable for toggling, confirm that higher-level code (such as migrations or data backfill routines) aligns with whether partitioning is expected.

1078-1096: Ensure consistent usage of taskEventStoreTable
The private get taskEventStoreTable() fallback and subsequent use in createMany is a neat approach. Double-check references in other code paths to confirm consistent usage across your entire codebase so the partitioning logic is never bypassed inadvertently.

apps/webapp/app/v3/services/expireEnqueuedRun.server.ts (2)

7-7: Confirm import location
The import of getTaskEventStoreTableForRun appears correct. If you rename or refactor the store logic in the future, ensure the path remains consistent across services like ExpireEnqueuedRunService or CancelTaskRunService.

81-103: Complete event logic
Using eventRepository.completeEvent with getTaskEventStoreTableForRun(run) is good for correctness under partitioned scenarios. However, ensure that if run.createdAt is far in the past, no relevant events get excluded by your date constraints in event repository queries.

apps/webapp/app/v3/services/crashTaskRun.server.ts (1)

123-131: LGTM! Proper integration of task event store partitioning.

The changes correctly integrate the new task event store functionality while maintaining existing error handling and logging. The addition of timestamps provides better context for event queries.

apps/webapp/app/presenters/v3/SpanPresenter.server.ts (1)

210-216: LGTM! Consistent implementation of task event store partitioning.

The changes correctly integrate the new task event store functionality while maintaining existing error handling and data transformation logic.

Also applies to: 366-372

apps/webapp/app/v3/services/completeAttempt.server.ts (1)

168-186: LGTM! Comprehensive integration of task event store partitioning.

The changes correctly integrate the new task event store functionality across all paths (success, failure, and system failure), while maintaining:

Existing error handling

Retry logic

Event completion semantics

Also applies to: 317-337, 379-386, 414-435

internal-packages/database/prisma/migrations/20250212075957_add_task_event_store_to_task_run/migration.sql (1)

1-5: LGTM! The migration safely adds the task event store column.

The migration correctly adds the taskEventStore column with a default value, ensuring backward compatibility with existing rows.
internal-packages/database/prisma/migrations/20250212053026_create_task_event_partitioned_table/migration.sql (1)

56-63: Appropriate indexes for common query patterns.

The indexes on traceId, spanId, and runId will support efficient:

Trace/span lookups for distributed tracing

Run-specific event queries
✅ Verification successful

🏁 Script executed:
#!/bin/bash
# Verify if similar indexes exist on the original TaskEvent table for consistency
rg -A 5 'CREATE INDEX.*TaskEvent.*(traceId|spanId|runId)'
Length of output: 1618

Index Consistency Verified
The migration file for the partitioned table correctly creates indexes on traceId, spanId, and runId, matching the indexes previously established in the TaskEvent table migrations. This ensures that query patterns for distributed tracing and run-specific events are consistently optimized.

The partitioned table (TaskEventPartitioned) defines indexes analogous to those on the original table (TaskEvent).

Shell script results confirm that index creation for these columns exists in both places.
internal-packages/database/prisma/schema.prisma (2)

1732-1732: LGTM! TaskRun model updated with event store tracking.

The taskEventStore field is correctly added with a default value of "taskEvent", matching the migration.

2710-2805: Well-structured TaskEventPartitioned model.

The model correctly implements the partitioned table with:

Composite primary key using @@id([id, createdAt])

Matching indexes from the migration

Field definitions consistent with the original TaskEvent model

Let's verify the field consistency between TaskEvent and TaskEventPartitioned models:
✅ Verification successful

🏁 Script executed:
#!/bin/bash
# Extract and compare field definitions between TaskEvent and TaskEventPartitioned
ast-grep --pattern $'model TaskEvent {
  $$$
}'

ast-grep --pattern $'model TaskEventPartitioned {
  $$$
}'
Length of output: 108

I'll generate a revised script to extract and display the full definitions of both models using rg. This should help us manually verify the field consistency between the TaskEvent and TaskEventPartitioned models.

Please run the following shell script:

Once we have the output from these commands, we can compare the field definitions to confirm consistency.

🏁 Script executed:
[run_scripts]
#!/bin/bash
echo "==== TaskEvent model definition ===="
rg "model TaskEvent {" -A 200 internal-packages/database/prisma/schema.prisma

echo "==== TaskEventPartitioned model definition ===="
rg "model TaskEventPartitioned {" -A 200 internal-packages/database/prisma/schema.prisma
Length of output: 698

Let's try a new script using fixed-string search with rg -F to avoid regex errors. Please run the following shell script to extract the full definitions of both models:

This will help us directly verify field consistency between the two models.

🏁 Script executed:
[run_scripts]
#!/bin/bash
echo "==== TaskEvent model definition ===="
rg -F "model TaskEvent {" -A 300 internal-packages/database/prisma/schema.prisma

echo "==== TaskEventPartitioned model definition ===="
rg -F "model TaskEventPartitioned {" -A 300 internal-packages/database/prisma/schema.prisma
Length of output: 11720

Field consistency verified between TaskEvent and TaskEventPartitioned

The field definitions align exactly between the two models except for the intentional change in the primary key—in TaskEvent, a single field is marked as the primary key, while in TaskEventPartitioned a composite primary key (id and createdAt) is used to support partitioning. This confirms that the partitioned model correctly mirrors the original TaskEvent model’s structure with the planned adjustments.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (5)

apps/webapp/app/v3/taskEventStore.server.ts (5)

25-31: Consider handling unknown or malformed taskEventStore values more explicitly.
If run.taskEventStore contains an unexpected string, it silently defaults to "taskEvent". Consider validating the field or logging a warning to flag data inconsistencies.

33-35: Ensure environment variable usage is consistent with boolean semantics.
Currently, partitioning is enabled by checking if TASK_EVENT_PARTITIONING_ENABLED === "1". Consider using a more standard boolean environment variable (true/false) for clarity.

43-49: Optional: Add error handling or logging in the create method.
If the database call fails, the error is not caught or logged. Consider wrapping this call in a try/catch or letting the caller handle the error with more context.

54-60: Optional: Add error handling or logging in the createMany method.
Similar to create, consider adding a mechanism to log or handle failures when inserting multiple records.

62-115: Update code comment to match the actual time offset logic.
The comment at line 83 references adding “1 minute” when computing the endCreatedAt boundary. However, the code uses env.TASK_EVENT_PARTITIONED_WINDOW_IN_SECONDS, which might exceed 60 seconds. Consider updating the comment for clarity.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b734ee2 and e353738.

📒 Files selected for processing (1)

apps/webapp/app/v3/taskEventStore.server.ts (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
GitHub Check: typecheck / typecheck
GitHub Check: units / 🧪 Unit Tests
GitHub Check: Analyze (javascript-typescript)

🔇 Additional comments (1)

apps/webapp/app/v3/taskEventStore.server.ts (1)

6-23: Use caution with partial and picked types for future schema extensions.
By Omit-ing and Pick-ing fields, additions to the underlying schema may need updates here. Ensure you maintain these types as the schema evolves.

coderabbitai · 2025-02-12T08:55:06Z

apps/webapp/app/v3/taskEventStore.server.ts

+  async findTraceEvents(
+    table: TaskEventStoreTable,
+    traceId: string,
+    startCreatedAt: Date,
+    endCreatedAt?: Date
+  ) {
+    if (table === "taskEventPartitioned") {
+      return await this.readReplica.$queryRaw<TraceEvent[]>`
+        SELECT 
+          "spanId",
+          "parentId",
+          "runId",
+          "idempotencyKey",
+          LEFT(message, 256) as message,
+          style,
+          "startTime",
+          duration,
+          "isError",
+          "isPartial",
+          "isCancelled",
+          level,
+          events,
+          "environmentType"
+        FROM "TaskEventPartitioned"
+        WHERE 
+          "traceId" = ${traceId} 
+          AND "createdAt" >= ${startCreatedAt.toISOString()}::timestamp 
+          AND "createdAt" < ${(endCreatedAt
+            ? new Date(endCreatedAt.getTime() + env.TASK_EVENT_PARTITIONED_WINDOW_IN_SECONDS * 1000)
+            : new Date()
+          ).toISOString()}::timestamp
+        ORDER BY "startTime" ASC
+        LIMIT ${env.MAXIMUM_TRACE_SUMMARY_VIEW_COUNT}
+      `;
+    } else {
+      return await this.readReplica.$queryRaw<TraceEvent[]>`
+        SELECT 
+          id,
+          "spanId",
+          "parentId",
+          "runId",
+          "idempotencyKey",
+          LEFT(message, 256) as message,
+          style,
+          "startTime",
+          duration,
+          "isError",
+          "isPartial",
+          "isCancelled",
+          level,
+          events,
+          "environmentType"
+        FROM "TaskEvent"
+        WHERE "traceId" = ${traceId}
+        ORDER BY "startTime" ASC
+        LIMIT ${env.MAXIMUM_TRACE_SUMMARY_VIEW_COUNT}
+      `;
+    }
+  }


🛠️ Refactor suggestion

Align selected columns in partitioned vs. unpartitioned queries.
In partitioned mode, the id column is omitted, while the unpartitioned query includes it, even though TraceEvent type excludes id. This discrepancy could cause confusion. Consider returning identical columns or updating the type.

… new runs are created (#1696) * Create new partitioned TaskEvent table, and switch to it gradually as new runs are created * Add env var for partition window in seconds * Make startCreatedAt required in task event store

* Uses image cards for the frameworks * Removes old snippets * New AI agents side menu section * WIP adding new ai agent pages * Better overview page * More copy added to the agent example pages * Copy improvements * Removes “Creating a project” page and side menu section * Fixes broken links * Updates to the latest Mintlify version, fixes issues, changes theme * Adds descriptions to the main dropdown menu items * Reformatted Introduction docs ‘landing page’ * Retry heartbeat timeouts by putting back in the queue (#1689) * If there’s a heartbeat error and no attempts we put it back in the queue to try again * When nacking, return whether it was put back in the queue or not * Try and nack, if it fails then fail the run * Consolidated switch statement * Fail executing/retrying runs * OOM retrying on larger machines (#1691) * OOM retrying on larger machines * Create forty-windows-shop.md * Update forty-windows-shop.md * Only retry again if the machine is different from the original * Kubernetes OOMs appear as non-zero sigkills, adding support for treating these as OOMs * Complete the original attempt span if retrying due to an OOM * Revert "Complete the original attempt span if retrying due to an OOM" This reverts commit 5f652c6. * chore: Update version for release (#1666) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Release 3.3.14 * Set machine when triggering docs * Batch queue runs that are waiting for deploy (#1693) * Detect ffmpeg OOM errors, added manual OutOfMemoryError (#1694) * Detect ffmpeg OOM errors, added manual OutOfMemoryError * Create eighty-spies-knock.md * Improved the machines docs, including the new OutOfMemoryError * chore: Update version for release (#1695) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Release 3.3.15 * Create new partitioned TaskEvent table, and switch to it gradually as new runs are created (#1696) * Create new partitioned TaskEvent table, and switch to it gradually as new runs are created * Add env var for partition window in seconds * Make startCreatedAt required in task event store * Don't create an attempt if the run is final, batchTriggerAndWait bad continue fix (#1698) * WIP fix for ResumeAttemptService selecting the wrong attempt (which has no error or output) * Don’t create an attempt if the run is already in a final status * Don’t get all the columns for the query. Improved the logging. * Added a log to the batch example * Filter out the undefined values * Fix missing logs on child runs by using the root task run createdAt if it exists (#1697) * Provider changes to support image cache (#1700) * add env var for additional pull secrets * make static images configurable * optional image prefixes * optional labels with sample rates * add missing core paths * remove excessive logs * Fix run container exits after OOM retries (#1701) * remove unused imports * tell run to exit before force requeue * handle exit for case where we already retried after oom * improve retry span and add machine props * don't try to exit run in dev * Upgrade local dev to use electric beta.15 (#1699) * Text fixes * Removed pnpm files --------- Co-authored-by: Matt Aitken <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Eric Allam <[email protected]> Co-authored-by: nicktrn <[email protected]>

Create new partitioned TaskEvent table, and switch to it gradually as…

50444de

… new runs are created

Add env var for partition window in seconds

b734ee2

coderabbitai bot reviewed Feb 12, 2025

View reviewed changes

Make startCreatedAt required in task event store

e353738

coderabbitai bot reviewed Feb 12, 2025

View reviewed changes

ericallam merged commit bd0cc54 into main Feb 12, 2025
11 checks passed

ericallam deleted the ea-branch-20 branch February 12, 2025 09:12

coderabbitai bot mentioned this pull request Feb 13, 2025

Agent docs examples #1706

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create new partitioned TaskEvent table, and switch to it gradually as new runs are created #1696

Create new partitioned TaskEvent table, and switch to it gradually as new runs are created #1696

ericallam commented Feb 12, 2025 •

edited by coderabbitai bot

Loading

changeset-bot bot commented Feb 12, 2025 •

edited

Loading

coderabbitai bot commented Feb 12, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Feb 12, 2025

Create new partitioned TaskEvent table, and switch to it gradually as new runs are created #1696

Create new partitioned TaskEvent table, and switch to it gradually as new runs are created #1696

Conversation

ericallam commented Feb 12, 2025 • edited by coderabbitai bot Loading

Summary by CodeRabbit

changeset-bot bot commented Feb 12, 2025 • edited Loading

⚠️ No Changeset found

coderabbitai bot commented Feb 12, 2025 • edited Loading

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Feb 12, 2025

Choose a reason for hiding this comment

ericallam commented Feb 12, 2025 •

edited by coderabbitai bot

Loading

changeset-bot bot commented Feb 12, 2025 •

edited

Loading

coderabbitai bot commented Feb 12, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)