Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run Engine 2.0 (alpha) #1575

Open
wants to merge 434 commits into
base: main
Choose a base branch
from
Open

Run Engine 2.0 (alpha) #1575

wants to merge 434 commits into from

Conversation

nicktrn
Copy link
Collaborator

@nicktrn nicktrn commented Dec 17, 2024

Good luck @coderabbitai

Summary by CodeRabbit

  • New Features
    • Introduced a "Run Engine 2.0 (alpha)" mode.
    • UI now displays worker type information in deployment details.
    • Added keyboard shortcut support—for example, the Cancel run button now responds to “C” and other controls accept shortcuts.
    • New icon graphics are used to signal task-cached runs.
    • Updated input validation now allows more flexible Run and Batch ID formats.

matt-aitken and others added 30 commits November 13, 2024 13:00
matt-aitken and others added 28 commits January 31, 2025 11:57
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 25

🔭 Outside diff range comments (2)
apps/webapp/app/components/runs/v3/RunFilters.tsx (1)

884-886: Inconsistent batch ID validation.

The batch ID validation in this file still enforces a strict 27-character length, while BatchFilters.tsx accepts both 27 and 31 characters. This inconsistency could lead to validation errors.

Apply this diff to align the validation with BatchFilters.tsx:

-    } else if (batchId.length !== 27) {
-      error = "Batch IDs are 27 characters long";
+    } else if (batchId.length !== 27 && batchId.length !== 31) {
+      error = "Batch IDs are 27/31 characters long";
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts (1)

503-524: Add error handling for unknown message types.

The switch statement for message types doesn't handle unknown types. Consider adding a default case to handle potential future message types gracefully.

   switch (data.type) {
     case "EXECUTE": {
       return await this.#handleExecuteMessage(message, data);
     }
     case "RESUME": {
       return await this.#handleResumeMessage(message, data);
     }
     case "RESUME_AFTER_DURATION": {
       return await this.#handleResumeAfterDurationMessage(message, data);
     }
     case "FAIL": {
       return await this.#handleFailMessage(message, data);
     }
+    default: {
+      logger.error("Unknown message type received", {
+        type: data.type,
+        messageId: message.messageId,
+      });
+      return {
+        action: "ack_and_do_more_work",
+        reason: "unknown_message_type",
+        attrs: { message_type: data.type },
+      };
+    }
   }
♻️ Duplicate comments (3)
apps/webapp/app/v3/runQueue.server.ts (1)

16-25: 🛠️ Refactor suggestion

Apply consistent error handling pattern to all concurrency functions.

The same error handling improvements should be applied to updateQueueConcurrencyLimits and removeQueueConcurrencyLimits functions.

Also applies to: 27-36

apps/webapp/app/routes/api.v1.dev.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.attempts.complete.ts (1)

54-60: 🛠️ Refactor suggestion

Enhance error handling to prevent sensitive information exposure.

Similar to the wait.duration.ts route, the error handling should be improved to prevent exposing sensitive details.

Apply this diff to improve error handling:

    } catch (error) {
      logger.error("Failed to complete dev attempt", {
        environmentId: authentication.environment.id,
        error,
      });
-     throw error;
+     throw new Response("Failed to process request", { status: 500 });
    }
.github/workflows/unit-tests.yml (1)

27-33: ⚠️ Potential issue

Pin the docker/login-action to a specific commit hash.

Using an unpinned version tag (@v3) for third-party actions is a security risk. Pin to a specific commit hash to prevent supply chain attacks.

-        uses: docker/login-action@v3
+        uses: docker/login-action@343f7c4344506bcbf9b4de18042ae17996df046d # v3.0.0
🧰 Tools
🪛 GitHub Check: CodeQL

[warning] 29-29: Unpinned tag for a non-immutable Action in workflow
Unpinned 3rd party Action '🧪 Unit Tests' step Uses Step uses 'docker/login-action' with ref 'v3', not a pinned commit hash

🧹 Nitpick comments (69)
apps/webapp/app/v3/services/initializeDeployment.server.ts (1)

55-72: Consider extracting image tag construction logic.

The image tag construction logic could be moved to a separate method to improve readability and maintainability.

+  private constructImageTag(
+    payload: InitializeDeploymentRequestBody,
+    environment: AuthenticatedEnvironment,
+    nextVersion: string
+  ) {
+    const sharedImageTag = `${payload.namespace ?? env.DEPLOY_REGISTRY_NAMESPACE}/${
+      environment.project.externalRef
+    }:${nextVersion}.${environment.slug}`;
+
+    const unmanagedImageParts = [];
+    if (payload.registryHost) {
+      unmanagedImageParts.push(payload.registryHost);
+    }
+    if (payload.namespace) {
+      unmanagedImageParts.push(payload.namespace);
+    }
+    unmanagedImageParts.push(
+      `${environment.project.externalRef}:${nextVersion}.${environment.slug}`
+    );
+
+    return {
+      sharedImageTag,
+      unmanagedImageTag: unmanagedImageParts.join("/")
+    };
+  }
apps/webapp/app/v3/marqs/index.server.ts (1)

1910-1913: Use consistent logging approach.

The code uses console.warn for logging while the rest of the file uses the logger utility. Consider using the logger utility for consistency.

-      console.warn(
-        "Could not initialize MarQS because process.env.REDIS_HOST and process.env.REDIS_PORT are required to be set. Trigger.dev v3 will not work without this."
-      );
+      logger.warn("Could not initialize MarQS because process.env.REDIS_HOST and process.env.REDIS_PORT are required to be set. Trigger.dev v3 will not work without this.", {
+        service: "marqs"
+      });
apps/webapp/app/utils/delays.ts (3)

17-42: Add JSDoc documentation and improve type safety.

The function handles multiple cases but lacks documentation explaining its behavior, parameters, and return values. Also, consider adding maximum future date validation to prevent unreasonable delays.

Add comprehensive documentation and validation:

+/**
+ * Parses a delay value into a future Date object.
+ * @param value - A string (ISO date or natural language duration) or Date object
+ * @returns A Promise that resolves to a future Date or undefined if invalid/past date
+ * @example
+ * // Returns a Date object 2 hours in the future
+ * await parseDelay("2 hours")
+ * // Returns a Date object for a specific future time
+ * await parseDelay("2024-12-31T00:00:00Z")
+ */
 export async function parseDelay(value?: string | Date): Promise<Date | undefined> {
   if (!value) {
     return;
   }

   if (value instanceof Date) {
+    // Validate maximum future date (e.g., 1 year)
+    const maxDate = new Date(Date.now() + 365 * 24 * 60 * 60 * 1000);
+    if (value > maxDate) {
+      return;
+    }
     return value;
   }

34-36: Consider timezone handling in date comparison.

The Date.now() comparison might have timezone-related edge cases. Consider using UTC methods for consistent comparison.

-    if (date.getTime() <= Date.now()) {
+    if (date.getTime() <= new Date().getTime()) {
       return;
     }

17-17: Remove unnecessary async keyword.

The function is marked as async but doesn't use any await operations. The parseNaturalLanguageDuration function appears to be synchronous.

-export async function parseDelay(value?: string | Date): Promise<Date | undefined> {
+export function parseDelay(value?: string | Date): Date | undefined {
apps/webapp/app/routes/api.v1.dev.runs.$runFriendlyId.snapshots.latest.ts (2)

34-36: Consider returning 403 or 404 instead of 401
When the run is not found, throwing a 401 might not accurately reflect the scenario. Typically, 401 indicates an authentication failure, whereas 403 indicates a permission issue, and 404 indicates a missing resource. You might want to switch the response to a status code that more closely matches the situation.


42-44: Differentiate between empty vs. retrieval failure
If there are legitimate cases where the snapshot can be empty, consider distinguishing that scenario from a data retrieval failure. This could help the client handle the situation more gracefully. For instance, returning a 204 or an empty structure might be preferable if no snapshot is available but no error truly occurred.

apps/webapp/app/routes/admin.api.v1.workers.ts (1)

46-56: Project update approach
When you set the defaultWorkerGroupId and engine to "V2", consider any existing references or data migration steps required. If older runs or worker groups still rely on the previous engine selection, ensure that switching to V2 doesn’t break backward compatibility or user expectations.

apps/webapp/app/routes/api.v1.dev.runs.$runFriendlyId.logs.debug.ts (1)

37-39: Return 403 or 404 instead of 401 for non-owned or non-existent runs
A 401 status typically means “unauthenticated,” whereas 403 or 404 might more accurately represent a “forbidden” or “not found” scenario. Consider using a status code that aligns better with the underlying cause.

apps/webapp/app/routes/api.v1.workers.ts (2)

21-23: Unify the error message for unsupported engine
The loader route returns a 400 status with the message "Not supported for V1 projects," whereas the action route returns "Not supported" without mentioning V1. Consider using consistent error messages across both routes to avoid confusion.


30-38: Consider adding pagination or a limit for the worker groups
If a project has a large number of worker groups, returning them all at once may degrade performance and user experience. A pagination or limit-based approach can help improve efficiency.

apps/webapp/app/routes/api.v1.dev.presence.ts (1)

46-64: Handle multiple SSE connections carefully
If multiple clients initiate SSE sessions simultaneously, each will set and update the same presence key. Consider whether you need to handle concurrent updates more granularly, particularly if you want to track individual user sessions.

apps/webapp/app/routes/api.v1.dev.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.attempts.start.ts (1)

46-48: Use a clearer HTTP status code
Throwing a 401 (Unauthorized) when a requested run is not found or not accessible may be slightly misleading. Evaluate whether 403 (Forbidden) or 404 (Not Found) better reflects the run’s absence or restricted access.

apps/webapp/app/routes/api.v1.dev.dequeue.ts (1)

8-62: Validate resource usage to avoid negative values.

Currently, line 53 subtracts the consumed resources from the available resources without any check. If the consumed resources exceed the available resources, this can produce negative values for CPU or memory. Consider clamping or validating resource consumption to ensure that the available resources never drop below zero.

apps/webapp/app/routes/api.v2.tasks.batch.ts (1)

120-152: JWT generation for browser clients.

Generating a short-lived JWT for browser clients is a good practice. Confirm that the scope restriction (“read:batch:<batch.id>”) meets the application’s security requirements. If non-browser clients need restricted access, consider extending this to them as well.

apps/webapp/app/utils/sse.ts (1)

13-17: Use 'undefined' instead of 'void' in union types.

Static analysis warns that “void” in a union type may cause confusion. Replace “void” with “undefined” to enhance clarity and reduce lint warnings.

 type SSEHandlers = {
   /** Return false to stop */
-  beforeStream?: () => Promise<boolean | void> | boolean | void;
+  beforeStream?: () => Promise<boolean | undefined> | boolean | undefined;
   /** Return false to stop */
-  initStream?: (params: HandlerParams) => Promise<boolean | void> | boolean | void;
+  initStream?: (params: HandlerParams) => Promise<boolean | undefined> | boolean | undefined;
   /** Return false to stop */
-  iterator?: (params: HandlerParams & { date: Date }) => Promise<boolean | void> | boolean | void;
+  iterator?: (params: HandlerParams & { date: Date }) => Promise<boolean | undefined> | boolean | undefined;
   cleanup?: () => void;
 };
🧰 Tools
🪛 Biome (1.9.4)

[error] 13-13: void is confusing inside a union type.

Unsafe fix: Use undefined instead.

(lint/suspicious/noConfusingVoidType)


[error] 15-15: void is confusing inside a union type.

Unsafe fix: Use undefined instead.

(lint/suspicious/noConfusingVoidType)


[error] 17-17: void is confusing inside a union type.

Unsafe fix: Use undefined instead.

(lint/suspicious/noConfusingVoidType)

apps/webapp/app/v3/services/cancelTaskRunV1.server.ts (2)

111-170: Check asynchronous cancellation for concurrency safety.

The use of a loop and asynchronous calls within "#cancelPotentiallyRunningAttempts" effectively cancels each attempt. However, consider locking or other concurrency controls to ensure that an attempt’s status isn’t altered in parallel elsewhere, which could result in partial or duplicated cancellations. The switch statement with "assertNever" is a good practice to catch unhandled statuses.


172-185: Inconsistent spelling: “CANCELED” vs. “cancelledAt.”

The code uses “CANCELED” for the run status but “cancelledAt” with two “l”s. This inconsistency can be confusing if future devs rely on naming conventions. Standardize on a single spelling to improve clarity.

apps/webapp/app/presenters/v3/SpanPresenter.server.ts (3)

43-50: Check for a more descriptive error message when parent run is missing.

Currently, if no parent run is found, the code simply returns without explanation. Consider throwing an error or logging a warning so issues with missing parent runs can be detected and diagnosed more easily.


241-248: Clarify large outputs cast to JSON.

When “outputType” is not “application/store” but is a large JSON, we pretty-print the packet. Confirm that extremely large outputs won’t cause performance overhead or memory issues. Consider chunking or streaming output if size can be significant.


379-384: Consistent caching indicator usage.

A truthy check for “span.originalRun” sets “isCached” to true. If future logic requires more nuanced handling of partial caches, consider a more expressive approach that tracks the partial or complete cache state.

apps/webapp/app/v3/runEngineHandlers.server.ts (3)

53-64: Alert enqueuing is a potential bottleneck.

Within the “runFailed” handler, “PerformTaskRunAlertsService.enqueue” should handle large concurrency gracefully. Examine queue depth or employ a backoff strategy for repeated failures, preventing blockages.


146-174: Check partial run attempt failures.

“runAttemptFailed” tries to crash all incomplete events except the current “spanId.” Consider whether any partial-lifecycle events not strictly tied to the failing attempt should remain open.


350-425: Confirm worker notifications are reliably delivered.

Worker notifications rely on socket-based emissions. Verify that any fallback or retry mechanism is in place if the socket is disconnected. Similarly, ensure that the debug logging does not block core logic if the event fails mid-way.

apps/webapp/app/v3/handleSocketIo.server.ts (2)

51-76: Unify namespace authentication if possible.
The logic for creating the workerNamespace and devWorkerNamespace looks solid, offering two authentication paths with distinct validations. However, to reduce duplication and improve maintainability, consider abstracting shared authentication logic into a single helper function.


409-418: Validate multi-value headers or provide explicit handling.
The new headersFromHandshake function copies headers from the socket handshake into a Headers object but skips multi-valued entries. If a single handshake header could contain multiple values, consider how that might affect downstream processing.

apps/webapp/app/v3/services/batchTriggerV3.server.ts (3)

64-143: Robust error handling in the call() method.
The method properly addresses multiple error scenarios (e.g., out-of-entitlement, Prisma uniqueness violations, invalid body items). Consider adding logs or metrics at key exit points to quickly identify root causes in production. Overall, the approach is thorough and well-structured.


317-478: processBatchTaskRun: Retries and graceful failover.
The method implements a max attempt count (MAX_ATTEMPTS) and logs relevant details, which is excellent. Consider updating batch status in the database if the attempt fails so that external monitoring tools can track failures and possibly trigger alerts.


480-681: Encapsulation of item-by-item processing and object store usage.
The #processBatchTaskRunItems and #handlePayloadPacket methods demonstrate a clear separation of concerns. The item-by-item approach is flexible but can be expensive if items are huge or numerous. Monitor performance overhead, and consider batching multiple item requests if necessary.

apps/webapp/app/services/routeBuilders/apiBuilder.server.ts (1)

726-846: createLoaderWorkerApiRoute: Good initial shape, consider fine-grained error codes.
The new worker loader route function effectively parses params, query, and headers before verifying the worker token. You might want to return more specific error codes or messages to surface the cause of failure (e.g., token expired vs. token invalid).

apps/webapp/app/utils/string.ts (1)

1-3: Add input validation for edge cases.

While the implementation is correct for basic use cases, it could be more robust by handling edge cases.

Consider this improved implementation:

 export function capitalizeWord(word: string) {
+  if (!word) return word;
   return word.charAt(0).toUpperCase() + word.slice(1).toLowerCase();
 }
apps/webapp/app/presenters/v3/DevPresenceStream.server.ts (1)

5-13: Convert class to utility functions.

The class contains only static methods. Following TypeScript best practices and static analysis suggestions, consider converting this to utility functions.

Here's the suggested implementation:

-export class DevPresenceStream {
-  static getPresenceKey(environmentId: string) {
-    return `${PRESENCE_KEY_PREFIX}${environmentId}`;
-  }
-
-  static getPresenceChannel(environmentId: string) {
-    return `${PRESENCE_CHANNEL_PREFIX}${environmentId}`;
-  }
-}
+export function getPresenceKey(environmentId: string) {
+  return `${PRESENCE_KEY_PREFIX}${environmentId}`;
+}
+
+export function getPresenceChannel(environmentId: string) {
+  return `${PRESENCE_CHANNEL_PREFIX}${environmentId}`;
+}
🧰 Tools
🪛 Biome (1.9.4)

[error] 5-16: Avoid classes that contain only static members.

Prefer using simple functions instead of classes with only static members.

(lint/complexity/noStaticOnlyClass)

apps/webapp/app/routes/api.v1.worker-actions.runs.$runFriendlyId.snapshots.latest.ts (1)

22-24: Enhance error message with context.

The current error message is generic. Consider including the runFriendlyId in the error message to help with debugging.

     if (!executionData) {
-      throw new Error("Failed to retrieve latest snapshot");
+      throw new Error(`Failed to retrieve latest snapshot for run ${runFriendlyId}`);
     }
apps/webapp/app/routes/api.v1.dev.config.ts (1)

21-22: Add environment variable validation.

Consider adding validation for the environment variables to ensure they have expected values:

+const envSchema = z.object({
+  DEV_DEQUEUE_INTERVAL_WITH_RUN: z.number().int().positive(),
+  DEV_DEQUEUE_INTERVAL_WITHOUT_RUN: z.number().int().positive(),
+});

 try {
+  const validatedEnv = envSchema.parse({
+    DEV_DEQUEUE_INTERVAL_WITH_RUN: env.DEV_DEQUEUE_INTERVAL_WITH_RUN,
+    DEV_DEQUEUE_INTERVAL_WITHOUT_RUN: env.DEV_DEQUEUE_INTERVAL_WITHOUT_RUN,
+  });
   return json({
     environmentId: authentication.environment.id,
-    dequeueIntervalWithRun: env.DEV_DEQUEUE_INTERVAL_WITH_RUN,
-    dequeueIntervalWithoutRun: env.DEV_DEQUEUE_INTERVAL_WITHOUT_RUN,
+    dequeueIntervalWithRun: validatedEnv.DEV_DEQUEUE_INTERVAL_WITH_RUN,
+    dequeueIntervalWithoutRun: validatedEnv.DEV_DEQUEUE_INTERVAL_WITHOUT_RUN,
   });
apps/webapp/app/routes/api.v2.batches.$batchId.ts (1)

30-38: Add type safety to response structure.

The response structure could benefit from a defined interface/type to ensure type safety and provide better documentation.

interface BatchResponse {
  id: string;
  status: string;
  idempotencyKey?: string;
  createdAt: Date;
  updatedAt: Date;
  runCount: number;
  runs: string[];
}

return json<BatchResponse>({
  id: batch.friendlyId,
  status: batch.status,
  idempotencyKey: batch.idempotencyKey ?? undefined,
  createdAt: batch.createdAt,
  updatedAt: batch.updatedAt,
  runCount: batch.runCount,
  runs: batch.runIds,
});
apps/webapp/app/routes/api.v1.deployments.latest.ts (2)

13-13: Enhance error message with more details.

The error message could be more descriptive to help clients better understand the issue.

-    return json({ error: "Invalid or Missing API key" }, { status: 401 });
+    return json({ error: "Authentication failed: Invalid or missing API key in request headers" }, { status: 401 });

32-41: Type the response structure for better maintainability.

Consider defining an interface for the response structure to improve type safety and maintainability.

interface DeploymentResponse {
  id: string;
  status: string;
  contentHash: string | null;
  shortCode: string | null;
  version: string | null;
  imageReference: string | null;
  errorData: any | null;
}

return json<DeploymentResponse>({
  id: deployment.friendlyId,
  status: deployment.status,
  contentHash: deployment.contentHash,
  shortCode: deployment.shortCode,
  version: deployment.version,
  imageReference: deployment.imageReference,
  errorData: deployment.errorData,
});
apps/webapp/app/v3/featureFlags.server.ts (2)

4-6: Add documentation for the feature flag catalog.

Consider adding JSDoc comments to document the purpose and usage of each feature flag.

 const FeatureFlagCatalog = {
+  /** 
+   * The default worker instance group ID to use when none is specified.
+   * This flag controls the fallback behavior for worker assignment.
+   */
   defaultWorkerInstanceGroupId: z.string(),
 };

26-28: Enhance error handling in flags function.

Consider logging parse failures to help with debugging feature flag issues.

     if (!parsed.success) {
+      logger.warn("Failed to parse feature flag value", {
+        key: opts.key,
+        value: value?.value,
+        errors: parsed.error.errors,
+      });
       return;
     }
apps/webapp/app/v3/services/cancelTaskRun.server.ts (1)

36-45: Enhance error handling in callV2.

Consider adding try-catch block to handle potential engine failures gracefully.

   private async callV2(
     taskRun: TaskRun,
     options?: CancelTaskRunServiceOptions
   ): Promise<CancelTaskRunServiceResult | undefined> {
+    try {
       const result = await engine.cancelRun({
         runId: taskRun.id,
         completedAt: options?.cancelledAt,
         reason: options?.reason,
         tx: this._prisma,
       });
+    } catch (error) {
+      logger.error("Failed to cancel task run", {
+        taskRunId: taskRun.id,
+        error,
+      });
+      throw error;
+    }
apps/webapp/app/routes/api.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.restore.ts (2)

28-28: Address the TODO comment about creating a debug span.

The TODO comment suggests adding debug span functionality. Consider implementing this for better observability.

Would you like me to help implement the debug span functionality?


46-49: Make error handling more specific.

Consider catching specific error types and providing appropriate error responses.

-    } catch (error) {
+    } catch (error) {
+      if (error instanceof PrismaError) {
+        logger.error("Database error while restoring run", {
+          runFriendlyId,
+          snapshotFriendlyId,
+          error,
+        });
+        return json({ ok: false, error: "Database error" }, { status: 500 });
+      }
       logger.error("Failed to restore run", { runFriendlyId, snapshotFriendlyId, error });
-      throw error;
+      return json({ ok: false, error: "Internal server error" }, { status: 500 });
     }
apps/webapp/app/routes/api.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.suspend.ts (1)

46-49: Improve error propagation in catch block.

The catch block logs the error but then re-throws it without any additional context, which might make debugging harder.

Consider wrapping the error with additional context:

    } catch (error) {
      logger.error("Failed to suspend run", { runFriendlyId, snapshotFriendlyId, error });
-      throw error;
+      throw new Error(`Failed to suspend run ${runFriendlyId}`, { cause: error });
    }
apps/webapp/app/assets/icons/TaskCachedIcon.tsx (1)

1-10: Enhance accessibility for TaskCachedIcon.

The SVG icon lacks accessibility attributes which are important for screen readers.

Add ARIA attributes and title:

export function TaskCachedIcon({ className }: { className?: string }) {
  return (
    <svg
      className={className}
      width="16"
      height="16"
      viewBox="0 0 16 16"
      fill="none"
      xmlns="http://www.w3.org/2000/svg"
+      role="img"
+      aria-label="Task Cached"
    >
apps/webapp/app/v3/services/baseService.server.ts (1)

41-44: Consider making engine required in WithRunEngineOptions.

The engine property is optional in WithRunEngineOptions, but it's a core dependency that should probably be required.

Consider making the engine property required:

export type WithRunEngineOptions<T> = T & {
  prisma?: PrismaClientOrTransaction;
-  engine?: RunEngine;
+  engine: RunEngine;
};
apps/webapp/app/routes/api.v1.dev.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.heartbeat.ts (2)

4-8: Remove unused imports.

The following imports are not used in the code:

  • WorkerApiDebugLogBody
  • WorkerApiRunAttemptStartResponseBody
-import {
-  WorkerApiDebugLogBody,
-  WorkerApiRunAttemptStartResponseBody,
-  WorkloadHeartbeatResponseBody,
-} from "@trigger.dev/core/v3/workers";
+import { WorkloadHeartbeatResponseBody } from "@trigger.dev/core/v3/workers";

48-55: Enhance error handling with specific error types.

The current error handling catches all errors and logs them generically. Consider adding specific error handling for different types of errors (e.g., database errors, validation errors) to provide more meaningful responses.

-    } catch (error) {
+    } catch (error: unknown) {
+      if (error instanceof Response) {
+        throw error;
+      }
+      
       logger.error("Failed to heartbeat dev run", {
         environmentId: authentication.environment.id,
         error,
       });
-      throw error;
+      throw new Response("Internal server error", { status: 500 });
     }
apps/webapp/app/v3/machinePresets.server.ts (1)

55-65: Consider enhancing type safety and performance.

The current implementation could be improved in two ways:

  1. Use explicit type for the return value to ensure type safety
  2. Consider caching the result since machine presets are static
+const cachedMachines: Record<string, MachinePreset> | null = null;
+
 export function allMachines(): Record<string, MachinePreset> {
+  if (cachedMachines) {
+    return cachedMachines;
+  }
+
-  return Object.fromEntries(
+  const result = Object.fromEntries(
     Object.entries(machines).map(([name, preset]) => [
       name,
       {
         name: name as MachinePresetName,
         ...preset,
       },
     ])
   );
+  return (cachedMachines = result);
 }
apps/webapp/app/v3/engineVersion.server.ts (3)

9-26: Add JSDoc to document the decision tree.

The function has complex logic for determining the engine version. Consider adding JSDoc to document the decision tree and possible return values.

+/**
+ * Determines the engine version based on the following priority:
+ * 1. Explicitly provided engine version
+ * 2. Project's engine version (if V1)
+ * 3. Specific worker version's engine
+ * 4. Environment type specific logic:
+ *    - Development: Latest background worker's engine or V1
+ *    - Deployed: Latest deployment type or V2
+ *
+ * @throws Error if a specific worker version is requested but not found
+ * @returns Promise<RunEngineVersion>
+ */
 export async function determineEngineVersion({

46-48: Enhance error message with more context.

The error message could be more descriptive to help with debugging.

-      throw new Error(`Worker not found: environment: ${environment.id} version: ${workerVersion}`);
+      throw new Error(
+        `Worker not found for environment "${environment.id}" and version "${workerVersion}". ` +
+        `Project: "${environment.projectId}"`
+      );

53-57: Consider caching the background worker query result.

For development environments, the background worker query might be called frequently. Consider caching the result with a short TTL to improve performance.

Would you like me to provide an implementation for caching the background worker query results?

apps/webapp/app/routes/api.v1.dev.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.attempts.complete.ts (1)

2-10: Remove unused imports.

Several imported types and functions are not used in the code.

Apply this diff to remove unused imports:

- import { assertExhaustive } from "@trigger.dev/core";
import { RunId, SnapshotId } from "@trigger.dev/core/v3/apps";
import {
-  WorkerApiDebugLogBody,
  WorkerApiRunAttemptCompleteRequestBody,
  WorkerApiRunAttemptCompleteResponseBody,
-  WorkerApiRunAttemptStartResponseBody,
-  WorkloadHeartbeatResponseBody,
} from "@trigger.dev/core/v3/workers";
apps/webapp/app/routes/admin.api.v1.orgs.$organizationId.environments.staging.ts (1)

57-67: Consider parallel processing for better performance.

The sequential processing of projects could be slow for organizations with many projects. Consider using Promise.all for parallel processing.

Apply this diff to improve performance:

-  for (const project of organization.projects) {
+  const results = await Promise.all(organization.projects.map(async (project) => {
     const stagingEnvironment = project.environments.find((env) => env.type === "STAGING");

     if (!stagingEnvironment) {
       const staging = await createEnvironment(organization, project, "STAGING");
       await updateEnvConcurrencyLimits({ ...staging, organization, project });
-      created++;
+      return true;
     } else {
       await updateEnvConcurrencyLimits({ ...stagingEnvironment, organization, project });
+      return false;
     }
-  }
+  }));
+  created = results.filter(Boolean).length;
apps/webapp/app/v3/services/createDeploymentBackgroundWorker.server.ts (2)

53-62: Extract engine upgrade logic to a separate service.

The engine upgrade logic should be in a separate service for better separation of concerns and reusability.

Consider creating a new service like UpgradeProjectEngineService and moving this logic there:

+class UpgradeProjectEngineService extends BaseService {
+  public async call(project: Project, targetEngine: "V1" | "V2") {
+    if (project.engine === "V1" && targetEngine === "V2") {
+      await this._prisma.project.update({
+        where: { id: project.id },
+        data: { engine: "V2" },
+      });
+    }
+  }
+}

-      if (environment.project.engine === "V1" && body.engine === "V2") {
-        await this._prisma.project.update({
-          where: {
-            id: environment.project.id,
-          },
-          data: {
-            engine: "V2",
-          },
-        });
-      }
+      await new UpgradeProjectEngineService().call(environment.project, body.engine);

84-103: Enhance error handling with specific error types.

The current error handling could be improved to be more specific about the type of error.

Consider creating specific error types:

+class BackgroundWorkerCreationError extends Error {
+  constructor(message: string, public readonly cause: unknown) {
+    super(message);
+    this.name = 'BackgroundWorkerCreationError';
+  }
+}

       throw error;
+      throw new BackgroundWorkerCreationError("Failed to create background worker", error);
apps/webapp/app/v3/services/replayTaskRun.server.ts (1)

122-124: Enhance error logging with structured context.

The error logging could be improved by adding more context and structure.

Consider this enhancement:

       logger.error("Failed to replay a run", {
-        error: error instanceof Error ? error.message : error,
+        error: error instanceof Error ? {
+          message: error.message,
+          name: error.name,
+          stack: error.stack,
+        } : error,
+        taskRunId: existingTaskRun.id,
+        taskRunFriendlyId: existingTaskRun.friendlyId,
+        environmentId: authenticatedEnvironment.id,
       });
apps/webapp/app/v3/services/finalizeDeployment.server.ts (1)

98-98: Add error handling and retry logic for concurrency update.

The concurrency update should have proper error handling and retry logic to ensure it completes successfully.

Consider this enhancement:

-      await updateEnvConcurrencyLimits(authenticatedEnv);
+      try {
+        await retry(
+          async () => {
+            await updateEnvConcurrencyLimits(authenticatedEnv);
+          },
+          {
+            retries: 3,
+            minTimeout: 1000,
+            maxTimeout: 5000,
+            onRetry: (error, attempt) => {
+              logger.warn("Retrying concurrency update", {
+                attempt,
+                error: error instanceof Error ? error.message : error,
+                environmentId: authenticatedEnv.id,
+              });
+            },
+          }
+        );
+      } catch (error) {
+        logger.error("Failed to update concurrency limits", {
+          error: error instanceof Error ? error.message : error,
+          environmentId: authenticatedEnv.id,
+        });
+        // Continue deployment even if concurrency update fails
+      }
apps/webapp/app/models/runtimeEnvironment.server.ts (1)

72-96: Simplify the return statement by removing redundant optional chaining.

The optional chaining operator on line 95 is redundant since we already check for !taskRun on line 91.

  if (!taskRun) {
    return null;
  }

-  return taskRun?.runtimeEnvironment;
+  return taskRun.runtimeEnvironment;
apps/webapp/app/services/personalAccessToken.server.ts (1)

132-132: Good addition of warning log!

The warning log improves observability for invalid token formats. Consider adding more context to help with debugging.

-    logger.warn(`PAT doesn't start with ${tokenPrefix}`);
+    logger.warn(`PAT validation failed: token doesn't start with ${tokenPrefix}`, {
+      tokenLength: token.length,
+      tokenPrefix: tokenPrefix,
+    });
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.v3.$projectParam.test.tasks.$taskParam/route.tsx (1)

124-124: Consider using structured error logging.

While the error logging improvement is good, it could be enhanced further with structured logging to include additional context.

Consider this enhancement:

-    logger.error("Failed to start a test run", { error: e instanceof Error ? e.message : e });
+    logger.error("Failed to start a test run", {
+      error: e instanceof Error ? {
+        message: e.message,
+        name: e.name,
+        stack: e.stack
+      } : e,
+      taskParam,
+      projectParam,
+      organizationSlug
+    });
apps/webapp/app/services/worker.server.ts (1)

746-754: Consider adding detailed logging for batch processing V3.

While the implementation is correct, adding detailed logging would help with monitoring and debugging batch processing operations.

Consider enhancing the handler:

       "v3.processBatchTaskRunV3": {
         priority: 0,
         maxAttempts: 5,
         handler: async (payload, job) => {
+          logger.debug("Starting batch task run processing V3", {
+            strategy: payload.strategy,
+            attempt: job.attempts,
+            maxAttempts: job.max_attempts
+          });
           const service = new BatchTriggerV3Service(payload.strategy);
 
           await service.processBatchTaskRun(payload);
+          logger.debug("Completed batch task run processing V3");
         },
       },
apps/webapp/app/v3/eventRepository.server.ts (4)

53-54: Add JSDoc documentation for the new isDebug property.

The isDebug property has been added to multiple types but lacks documentation explaining its purpose and usage.

Add JSDoc comments to document the purpose and behavior of the isDebug property:

+ /** Flag indicating whether this event is for debugging purposes */
  | "isDebug"

Also applies to: 59-59, 169-169, 200-201


266-290: Use type assertion instead of satisfies operator for better type inference.

The satisfies operator in this context might limit type inference. Consider using type assertion instead.

-} satisfies CreatableEvent;
+} as CreatableEvent;

812-813: Consider extracting debug level logic into a constant.

The debug level determination logic could be made more maintainable by extracting it into a constant.

+const DEBUG_EVENT_LEVEL = "WARN" as const;
+
 const isDebug = options.attributes.isDebug;
-level: isDebug ? "WARN" : "TRACE",
+level: isDebug ? DEBUG_EVENT_LEVEL : "TRACE",

Also applies to: 830-830


1580-1601: Enhance error handling in rehydrateAttribute function.

The function could benefit from more robust error handling and type validation.

 function rehydrateAttribute<T extends AttributeValue>(
   properties: Prisma.JsonValue,
   key: string
 ): T | undefined {
   if (properties === null || properties === undefined) {
     return;
   }

   if (typeof properties !== "object") {
     return;
   }

   if (Array.isArray(properties)) {
     return;
   }

   const value = properties[key];

-  if (!value) return;
+  if (value === null || value === undefined) {
+    return;
+  }
+
+  // Add type validation
+  if (typeof value !== typeof (undefined as unknown as T)) {
+    logger.warn("Type mismatch in rehydrateAttribute", {
+      key,
+      expectedType: typeof (undefined as unknown as T),
+      actualType: typeof value
+    });
+    return;
+  }

   return value as T;
 }
apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.v3.$projectParam.runs.$runParam.spans.$spanParam/route.tsx (1)

658-669: Added admin-only queue information.

Queue information is now conditionally rendered based on admin access, providing additional debugging capabilities for administrators.

Consider adding tooltips to explain the queue information for admin users who might not be familiar with these technical details.

apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts (3)

70-110: Consider using a more descriptive discriminator field name.

The discriminated union type uses a generic field name "type". Consider using a more domain-specific discriminator like "messageType" or "queueMessageType" to improve code clarity.

-  type: z.literal("EXECUTE"),
+  messageType: z.literal("EXECUTE"),

674-710: Consider extracting machine preset logic into a separate function.

The machine preset determination logic is repeated in multiple places. Consider extracting it into a reusable function.

+ private async determineTaskRunMachinePreset(
+   existingPreset: string | null,
+   taskConfig: Record<string, any> | null
+ ): Promise<string> {
+   return existingPreset ?? 
+     machinePresetFromConfig(taskConfig ?? {}).name;
+ }

   const lockedTaskRun = await prisma.taskRun.update({
     where: {
       id: message.messageId,
     },
     data: {
       lockedAt: new Date(),
       lockedById: backgroundTask.id,
       lockedToVersionId: worker.id,
       taskVersion: worker.version,
       sdkVersion: worker.sdkVersion,
       cliVersion: worker.cliVersion,
       startedAt: existingTaskRun.startedAt ?? new Date(),
       baseCostInCents: env.CENTS_PER_RUN,
-      machinePreset:
-        existingTaskRun.machinePreset ??
-        machinePresetFromConfig(backgroundTask.machineConfig ?? {}).name,
+      machinePreset: await this.determineTaskRunMachinePreset(
+        existingTaskRun.machinePreset,
+        backgroundTask.machineConfig
+      ),

1995-2019: Consider using environment variable constants.

The environment variable names are hardcoded strings. Consider using constants to prevent typos and improve maintainability.

+ const ENV_VARS = {
+   TRIGGER_JWT: "TRIGGER_JWT",
+   TRIGGER_RUN_ID: "TRIGGER_RUN_ID",
+   TRIGGER_MACHINE_PRESET: "TRIGGER_MACHINE_PRESET",
+ } as const;

  async #buildEnvironmentVariables(
    environment: RuntimeEnvironment,
    runId: string,
    machinePreset: MachinePreset
  ): Promise<Array<EnvironmentVariable>> {
    const variables = await resolveVariablesForEnvironment(environment);

    const jwt = await generateJWTTokenForEnvironment(environment, {
      run_id: runId,
      machine_preset: machinePreset.name,
    });

    return [
      ...variables,
      ...[
-        { key: "TRIGGER_JWT", value: jwt },
-        { key: "TRIGGER_RUN_ID", value: runId },
-        { key: "TRIGGER_MACHINE_PRESET", value: machinePreset.name },
+        { key: ENV_VARS.TRIGGER_JWT, value: jwt },
+        { key: ENV_VARS.TRIGGER_RUN_ID, value: runId },
+        { key: ENV_VARS.TRIGGER_MACHINE_PRESET, value: machinePreset.name },
      ],
    ];
  }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0d38ea0 and 6088c27.

⛔ Files ignored due to path filters (2)
  • internal-packages/run-engine/execution-states.png is excluded by !**/*.png
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (107)
  • .changeset/breezy-turtles-talk.md (1 hunks)
  • .configs/tsconfig.base.json (1 hunks)
  • .github/workflows/e2e.yml (1 hunks)
  • .github/workflows/pr_checks.yml (1 hunks)
  • .github/workflows/unit-tests.yml (1 hunks)
  • .npmrc (1 hunks)
  • .vscode/launch.json (1 hunks)
  • apps/coordinator/package.json (1 hunks)
  • apps/coordinator/src/checkpointer.ts (1 hunks)
  • apps/coordinator/tsconfig.json (1 hunks)
  • apps/docker-provider/package.json (1 hunks)
  • apps/kubernetes-provider/package.json (1 hunks)
  • apps/kubernetes-provider/tsconfig.json (1 hunks)
  • apps/proxy/package.json (0 hunks)
  • apps/webapp/app/assets/icons/TaskCachedIcon.tsx (1 hunks)
  • apps/webapp/app/components/primitives/Switch.tsx (2 hunks)
  • apps/webapp/app/components/runs/v3/BatchFilters.tsx (1 hunks)
  • apps/webapp/app/components/runs/v3/RunFilters.tsx (1 hunks)
  • apps/webapp/app/components/runs/v3/RunIcon.tsx (2 hunks)
  • apps/webapp/app/consts.ts (0 hunks)
  • apps/webapp/app/entry.server.tsx (2 hunks)
  • apps/webapp/app/env.server.ts (3 hunks)
  • apps/webapp/app/models/runtimeEnvironment.server.ts (4 hunks)
  • apps/webapp/app/presenters/v3/DeploymentListPresenter.server.ts (3 hunks)
  • apps/webapp/app/presenters/v3/DeploymentPresenter.server.ts (2 hunks)
  • apps/webapp/app/presenters/v3/DevPresenceStream.server.ts (1 hunks)
  • apps/webapp/app/presenters/v3/RunStreamPresenter.server.ts (1 hunks)
  • apps/webapp/app/presenters/v3/SpanPresenter.server.ts (11 hunks)
  • apps/webapp/app/presenters/v3/TaskListPresenter.server.ts (1 hunks)
  • apps/webapp/app/presenters/v3/TasksStreamPresenter.server.ts (1 hunks)
  • apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.v3.$projectParam.deployments.$deploymentParam/route.tsx (2 hunks)
  • apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.v3.$projectParam.deployments/route.tsx (2 hunks)
  • apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.v3.$projectParam.runs.$runParam/route.tsx (5 hunks)
  • apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.v3.$projectParam.runs.electric.$runParam/route.tsx (1 hunks)
  • apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.v3.$projectParam.test.tasks.$taskParam/route.tsx (1 hunks)
  • apps/webapp/app/routes/admin.api.v1.environments.$environmentId.ts (2 hunks)
  • apps/webapp/app/routes/admin.api.v1.orgs.$organizationId.concurrency.ts (2 hunks)
  • apps/webapp/app/routes/admin.api.v1.orgs.$organizationId.environments.staging.ts (2 hunks)
  • apps/webapp/app/routes/admin.api.v1.workers.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.batches.$batchId.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.deployments.latest.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.deployments.ts (2 hunks)
  • apps/webapp/app/routes/api.v1.dev.config.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.dev.dequeue.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.dev.presence.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.dev.runs.$runFriendlyId.logs.debug.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.dev.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.attempts.complete.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.dev.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.attempts.start.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.dev.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.heartbeat.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.dev.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.wait.duration.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.dev.runs.$runFriendlyId.snapshots.latest.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.tasks.$taskId.trigger.ts (5 hunks)
  • apps/webapp/app/routes/api.v1.tasks.batch.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.worker-actions.connect.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.worker-actions.deployments.$deploymentFriendlyId.dequeue.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.worker-actions.dequeue.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.worker-actions.heartbeat.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.worker-actions.runs.$runFriendlyId.logs.debug.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.attempts.complete.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.attempts.start.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.continue.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.heartbeat.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.restore.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.suspend.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.wait.duration.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.worker-actions.runs.$runFriendlyId.snapshots.latest.ts (1 hunks)
  • apps/webapp/app/routes/api.v1.workers.ts (1 hunks)
  • apps/webapp/app/routes/api.v2.batches.$batchId.ts (1 hunks)
  • apps/webapp/app/routes/api.v2.tasks.batch.ts (1 hunks)
  • apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.v3.$projectParam.runs.$runParam.spans.$spanParam/route.tsx (7 hunks)
  • apps/webapp/app/services/apiRateLimit.server.ts (1 hunks)
  • apps/webapp/app/services/personalAccessToken.server.ts (1 hunks)
  • apps/webapp/app/services/routeBuilders/apiBuilder.server.ts (9 hunks)
  • apps/webapp/app/services/worker.server.ts (3 hunks)
  • apps/webapp/app/utils/delays.ts (2 hunks)
  • apps/webapp/app/utils/sse.server.ts (1 hunks)
  • apps/webapp/app/utils/sse.ts (1 hunks)
  • apps/webapp/app/utils/string.ts (1 hunks)
  • apps/webapp/app/utils/taskEvent.ts (1 hunks)
  • apps/webapp/app/v3/authenticatedSocketConnection.server.ts (2 hunks)
  • apps/webapp/app/v3/engineVersion.server.ts (1 hunks)
  • apps/webapp/app/v3/eventRepository.server.ts (22 hunks)
  • apps/webapp/app/v3/featureFlags.server.ts (1 hunks)
  • apps/webapp/app/v3/handleSocketIo.server.ts (4 hunks)
  • apps/webapp/app/v3/machinePresets.server.ts (1 hunks)
  • apps/webapp/app/v3/marqs/devQueueConsumer.server.ts (1 hunks)
  • apps/webapp/app/v3/marqs/index.server.ts (3 hunks)
  • apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts (1 hunks)
  • apps/webapp/app/v3/models/workerDeployment.server.ts (6 hunks)
  • apps/webapp/app/v3/registryProxy.server.ts (1 hunks)
  • apps/webapp/app/v3/runEngine.server.ts (1 hunks)
  • apps/webapp/app/v3/runEngineHandlers.server.ts (1 hunks)
  • apps/webapp/app/v3/runQueue.server.ts (1 hunks)
  • apps/webapp/app/v3/services/baseService.server.ts (2 hunks)
  • apps/webapp/app/v3/services/batchTriggerV2.server.ts (4 hunks)
  • apps/webapp/app/v3/services/batchTriggerV3.server.ts (1 hunks)
  • apps/webapp/app/v3/services/cancelTaskRun.server.ts (1 hunks)
  • apps/webapp/app/v3/services/cancelTaskRunV1.server.ts (1 hunks)
  • apps/webapp/app/v3/services/completeAttempt.server.ts (0 hunks)
  • apps/webapp/app/v3/services/createBackgroundWorker.server.ts (6 hunks)
  • apps/webapp/app/v3/services/createCheckpoint.server.ts (2 hunks)
  • apps/webapp/app/v3/services/createDeployedBackgroundWorker.server.ts (4 hunks)
  • apps/webapp/app/v3/services/createDeploymentBackgroundWorker.server.ts (4 hunks)
  • apps/webapp/app/v3/services/finalizeDeployment.server.ts (2 hunks)
  • apps/webapp/app/v3/services/heartbeatService.server.ts (0 hunks)
  • apps/webapp/app/v3/services/initializeDeployment.server.ts (5 hunks)
  • apps/webapp/app/v3/services/replayTaskRun.server.ts (1 hunks)
⛔ Files not processed due to max files limit (68)
  • apps/webapp/app/v3/services/rescheduleTaskRun.server.ts
  • apps/webapp/app/v3/services/rollbackDeployment.server.ts
  • apps/webapp/app/v3/services/triggerTask.server.ts
  • apps/webapp/app/v3/services/triggerTaskV1.server.ts
  • apps/webapp/app/v3/services/triggerTaskV2.server.ts
  • apps/webapp/app/v3/services/worker/workerGroupService.server.ts
  • apps/webapp/app/v3/services/worker/workerGroupTokenService.server.ts
  • apps/webapp/package.json
  • apps/webapp/prisma/populate.ts
  • apps/webapp/remix.config.js
  • apps/webapp/tsconfig.json
  • internal-packages/database/package.json
  • internal-packages/database/prisma/migrations/20250103152909_add_run_engine_v2/migration.sql
  • internal-packages/database/prisma/migrations/20250106172943_added_span_id_to_complete_to_task_run_waitpoint/migration.sql
  • internal-packages/database/prisma/migrations/20250109131442_added_batch_and_index_to_task_run_waitpoint_and_task_run_execution_snapshot/migration.sql
  • internal-packages/database/prisma/migrations/20250109173506_waitpoint_added_batch_type/migration.sql
  • internal-packages/database/prisma/migrations/20250109175955_waitpoint_added_completed_by_batch_id_index/migration.sql
  • internal-packages/database/prisma/migrations/20250114153223_task_run_waitpoint_unique_constraint_added_batch_index/migration.sql
  • internal-packages/database/prisma/migrations/20250116115746_rename_blocked_by_waitpoints_to_suspended/migration.sql
  • internal-packages/database/prisma/migrations/20250128160520_add_runner_id_to_execution_snapshots/migration.sql
  • internal-packages/database/prisma/migrations/20250130173941_background_worker_added_engine_version_column/migration.sql
  • internal-packages/database/prisma/migrations/20250207104914_added_environment_and_environment_type_to_task_run_execution_snapshot/migration.sql
  • internal-packages/database/prisma/schema.prisma
  • internal-packages/emails/package.json
  • internal-packages/otlp-importer/package.json
  • internal-packages/redis-worker/package.json
  • internal-packages/redis-worker/src/index.ts
  • internal-packages/redis-worker/src/queue.ts
  • internal-packages/redis-worker/src/worker.test.ts
  • internal-packages/redis-worker/src/worker.ts
  • internal-packages/redis-worker/tsconfig.json
  • internal-packages/run-engine/README.md
  • internal-packages/run-engine/package.json
  • internal-packages/run-engine/src/engine/consts.ts
  • internal-packages/run-engine/src/engine/db/worker.ts
  • internal-packages/run-engine/src/engine/errors.ts
  • internal-packages/run-engine/src/engine/eventBus.ts
  • internal-packages/run-engine/src/engine/executionSnapshots.ts
  • internal-packages/run-engine/src/engine/locking.test.ts
  • internal-packages/run-engine/src/engine/locking.ts
  • internal-packages/run-engine/src/engine/machinePresets.ts
  • internal-packages/run-engine/src/engine/statuses.ts
  • internal-packages/run-engine/src/engine/tests/batchTrigger.test.ts
  • internal-packages/run-engine/src/engine/tests/batchTriggerAndWait.test.ts
  • internal-packages/run-engine/src/engine/tests/cancelling.test.ts
  • internal-packages/run-engine/src/engine/tests/checkpoints.test.ts
  • internal-packages/run-engine/src/engine/tests/delays.test.ts
  • internal-packages/run-engine/src/engine/tests/dequeuing.test.ts
  • internal-packages/run-engine/src/engine/tests/heartbeats.test.ts
  • internal-packages/run-engine/src/engine/tests/notDeployed.test.ts
  • internal-packages/run-engine/src/engine/tests/priority.test.ts
  • internal-packages/run-engine/src/engine/tests/trigger.test.ts
  • internal-packages/run-engine/src/engine/tests/triggerAndWait.test.ts
  • internal-packages/run-engine/src/engine/tests/ttl.test.ts
  • internal-packages/run-engine/src/engine/tests/waitpoints.test.ts
  • internal-packages/run-engine/src/engine/types.ts
  • internal-packages/run-engine/src/index.ts
  • internal-packages/run-engine/src/run-queue/index.test.ts
  • internal-packages/run-engine/src/run-queue/index.ts
  • internal-packages/run-engine/src/run-queue/keyProducer.test.ts
  • internal-packages/run-engine/src/run-queue/keyProducer.ts
  • internal-packages/run-engine/src/run-queue/simpleWeightedPriorityStrategy.ts
  • internal-packages/run-engine/src/run-queue/types.ts
  • internal-packages/run-engine/src/shared/index.ts
  • internal-packages/run-engine/tsconfig.json
  • internal-packages/run-engine/vitest.config.ts
  • internal-packages/testcontainers/package.json
  • internal-packages/testcontainers/src/index.ts
💤 Files with no reviewable changes (4)
  • apps/webapp/app/consts.ts
  • apps/webapp/app/v3/services/completeAttempt.server.ts
  • apps/webapp/app/v3/services/heartbeatService.server.ts
  • apps/proxy/package.json
✅ Files skipped from review due to trivial changes (10)
  • .configs/tsconfig.base.json
  • apps/kubernetes-provider/tsconfig.json
  • apps/webapp/app/presenters/v3/RunStreamPresenter.server.ts
  • apps/webapp/app/presenters/v3/TaskListPresenter.server.ts
  • apps/coordinator/src/checkpointer.ts
  • apps/webapp/app/presenters/v3/TasksStreamPresenter.server.ts
  • apps/webapp/app/v3/marqs/devQueueConsumer.server.ts
  • apps/webapp/app/utils/sse.server.ts
  • .changeset/breezy-turtles-talk.md
  • .npmrc
🧰 Additional context used
🪛 GitHub Check: CodeQL
.github/workflows/unit-tests.yml

[warning] 29-29: Unpinned tag for a non-immutable Action in workflow
Unpinned 3rd party Action '🧪 Unit Tests' step Uses Step uses 'docker/login-action' with ref 'v3', not a pinned commit hash

🪛 Biome (1.9.4)
apps/webapp/app/presenters/v3/DevPresenceStream.server.ts

[error] 5-16: Avoid classes that contain only static members.

Prefer using simple functions instead of classes with only static members.

(lint/complexity/noStaticOnlyClass)

apps/webapp/app/utils/sse.ts

[error] 13-13: void is confusing inside a union type.

Unsafe fix: Use undefined instead.

(lint/suspicious/noConfusingVoidType)


[error] 15-15: void is confusing inside a union type.

Unsafe fix: Use undefined instead.

(lint/suspicious/noConfusingVoidType)


[error] 17-17: void is confusing inside a union type.

Unsafe fix: Use undefined instead.

(lint/suspicious/noConfusingVoidType)

🔇 Additional comments (83)
apps/coordinator/tsconfig.json (2)

3-3: Updated ECMAScript target to es2018
The target version has been upgraded from "es2016" to "es2018", which enables newer ECMAScript features. Please ensure that all project dependencies and code are compatible with this target.


1-16: Confirm file inclusion settings after removal of "include" and "exclude"
The properties "include" and "exclude" have been removed from this configuration. Verify that the base configuration (or project structure) adequately restricts the files to compile, to avoid unintentional inclusion of unwanted files.

apps/docker-provider/package.json (1)

23-26: Streamlined DevDependencies in Docker Provider

The removal of the TypeScript-related dependencies (i.e., @types/node and typescript) is consistent with the project's effort to simplify the development setup. The maintained dependency "tsx" at version "^4.7.0" appears sufficient for the tasks at hand.

apps/kubernetes-provider/package.json (1)

23-26: Consistent Dependency Streamlining in Kubernetes Provider

The update to the devDependencies here also shows the removal of the explicit TypeScript dependency, keeping only the required "tsx" (version "^4.7.0") alongside other utilities like dotenv and esbuild. This aligns with the broader project strategy to reduce redundant TypeScript overhead.

apps/coordinator/package.json (1)

25-29: Reduced DevDependencies in Coordinator

The adjustments include the removal of both the TypeScript compiler and the Node.js type definitions (e.g., "@types/node"). Retaining "tsx" at "^4.7.0" should cover necessary transpilation; ensure that the removal of static type-checking dependencies does not impact any build or test workflows that previously depended on them.

apps/webapp/app/v3/services/initializeDeployment.server.ts (2)

21-23: LGTM! Good validation check.

The validation ensures V1 deployments are only allowed for non-V2 engine projects, preventing potential compatibility issues.


75-83: LGTM! Good logging implementation.

The logging provides useful context for deployment creation, including environment, project, version, and image details.

apps/webapp/app/presenters/v3/DeploymentPresenter.server.ts (1)

71-71: LGTM! Consistent type field addition.

The type field is correctly added to both the database selection and the return object, maintaining data consistency.

Also applies to: 158-158

apps/webapp/app/presenters/v3/DeploymentListPresenter.server.ts (1)

98-98: LGTM! Complete type field integration.

The type field is properly integrated across all layers: type definition, SQL query, and return object mapping.

Also applies to: 114-114, 151-151

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.v3.$projectParam.deployments/route.tsx (1)

325-328: LGTM! Enhanced rollback validation.

The rollback capability is now correctly restricted to managed deployments only, preventing potential issues with unmanaged deployments.

apps/webapp/app/v3/marqs/index.server.ts (3)

1668-1668: LGTM! The return statement format is correct.

The array return format in the Lua scripts is consistent with Lua syntax and matches the expected return type in the TypeScript interface declarations.

Also applies to: 1697-1697


1750-1877: LGTM! Redis command type declarations are well-defined.

The TypeScript interface declarations provide strong type safety for Redis commands and accurately reflect the Lua script implementations.


36-36: Verify compatibility with the imported sanitizeQueueName function.

The function implementation has been moved to a shared library. While this improves code reuse, we should ensure that the imported function maintains the same signature and behavior as the previous implementation.

Run the following script to compare the implementations:

✅ Verification successful

🏁 Script executed:

#!/bin/bash
# Description: Compare the old and new implementations of sanitizeQueueName

# Search for any remaining references to the old implementation
rg -A 5 "function sanitizeQueueName"

# Search for all usages of sanitizeQueueName to verify compatibility
rg "sanitizeQueueName\("

Length of output: 2217


Compatibility of Imported sanitizeQueueName Confirmed

The shell script output confirms that the shared implementation in packages/core/src/v3/apps/queueName.ts remains unchanged and its usage throughout the codebase is consistent with prior expectations. All references to sanitizeQueueName show compatibility with the exported function in index.server.ts.

apps/webapp/app/utils/delays.ts (1)

1-1: Verify the v3 package compatibility.

The import is using a v3 path (@trigger.dev/core/v3/apps) while the PR title mentions "2.0". Please confirm if this is the intended version.

apps/webapp/app/routes/api.v1.dev.runs.$runFriendlyId.snapshots.latest.ts (2)

10-16: Question about the placeholder 'findResource: async () => 1' implementation
It appears that the findResource function is returning the numeric value 1 without actually querying or validating anything. If this placeholder is intentional (e.g., used solely to satisfy the API builder's interface), consider adding a comment about its purpose or removing it if no longer needed.


47-53: Potentially limit or wrap the thrown error
Currently, the raw error is rethrown. If the engine error or Prisma query failure exposes sensitive details, you may want to wrap or sanitize it before throwing. Logging the full error is helpful for debugging, but returning it verbatim to the client might leak internal information.
[security]

apps/webapp/app/routes/admin.api.v1.workers.ts (2)

14-34: Authorization checks look good, but consider more precise status codes
The request flow handles invalid credentials (401) and admin-only checks (403). Ensure this logic aligns with your organization’s guidelines around HTTP status codes. For instance, returning 404 instead of 401 or 403 in certain resource-limiting contexts might be preferable to avoid exposing private resource existence.


36-39: Validate empty request bodies
You parse the JSON request with Zod, but the code sets rawBody to an empty object in the event the request is empty. Ensure that your front-end or any calling services always provide a JSON body, or handle the case gracefully by providing default values for name, description, etc.

apps/webapp/app/routes/api.v1.dev.runs.$runFriendlyId.logs.debug.ts (3)

41-51: Good use of recordRunDebugLog with structured attributes
Your structured approach to logging debug messages, timestamp, and properties facilitates better observability. This consistent format can be invaluable when filtering or analyzing logs.


56-63: Switch statement for error codes is correct
Using a switch on eventResult.code ensures you handle known error cases distinctly and guard against unexpected values by asserting exhaustiveness. This is an excellent practice.


64-70: Error rethrow might leak internal information
Rethrowing the original error object can leak confidential data. In production, consider wrapping or sanitizing the error. At the very least, ensure it won’t include stack traces or sensitive system details when the code runs outside of a dev environment.
[security]

apps/webapp/app/routes/api.v1.workers.ts (1)

63-66: Verify the security implications of returning plaintext tokens
The token's plaintext value is directly included in the response. Ensure that it is only transmitted over secure connections (HTTPS/TLS) and that no sensitive information is stored in the token that could be misused if intercepted.

apps/webapp/app/routes/api.v1.dev.presence.ts (1)

9-16: Ensure your Redis connection is properly secured
You're correctly using environment variables for Redis configuration. Verify that they're not logged anywhere inadvertently, and that TLS is properly enabled for production environments to protect data in transit.

apps/webapp/app/routes/api.v1.dev.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.attempts.start.ts (1)

77-101: Reassess returning environment variables in responses
The returned environment variables include a JWT and other potentially sensitive details. Confirm that exposing these values in the client response aligns with your security strategy and least-privilege principles.

apps/webapp/app/routes/api.v1.dev.dequeue.ts (1)

88-90: Clarify resource threshold condition.

The function returns true only if both CPU and memory are strictly greater than zero. If exactly zero resources remain, the function deems it insufficient. Verify that this logic aligns with the desired behavior. Otherwise, consider using ≥ instead of > if minimal resource usage >= 0 is acceptable.

apps/webapp/app/routes/api.v2.tasks.batch.ts (2)

19-49: Body validation logic looks good.

The checks for empty batches and for maximum batch size help ensure correct usage. The 400 status code is appropriate for these scenarios.


51-116: Robust error handling.

Returning the message in JSON while logging full error details is a balanced approach, preserving confidentiality while still aiding diagnosis through logs. Just verify no personally identifiable information is included in error messages.

apps/webapp/app/utils/sse.ts (1)

38-183: SSE connection management and graceful termination.

The logic properly sets up a combined abort signal and registers event listeners on request, timeout, and the internal controller. This ensures correct cleanup and termination. Make sure you monitor open connections in production to avoid excessive resource usage if clients fail to close streams.

apps/webapp/app/v3/services/cancelTaskRunV1.server.ts (2)

26-30: Definition clarity is good.

This type cleanly enumerates the optional fields needed for cancellation. It ensures future expansion without breaking existing usage.


32-49: Ensure errors after status transition are handled.

After setting a run’s status to “CANCELED,” subsequent errors (e.g., DB errors, network issues) might leave the system in a partially updated state. Consider wrapping critical steps in a transaction or adding rollback logic to handle failures post-status-change.

apps/webapp/app/presenters/v3/SpanPresenter.server.ts (1)

79-85: Optimize retrieval logic for runs.

Using “eventRepository.getSpan” followed by “findFirst” on the same span might be redundant in high-traffic environments. Investigate whether you can combine or streamline these calls to reduce database trips.
[performance]

apps/webapp/app/v3/runEngineHandlers.server.ts (2)

67-144: Ensure incomplete events are universally cleaned up on run failure.

The logic uses “queryIncompleteEvents” and completes them as error events. This approach is correct for single-run scope. However, if you have multi-run dependencies, ensure they’re not erroneously marked as error without reason.


199-233: Expiring runs with TTL.

The code logs an exception event stating that the run expired, but verify that any external systems expecting a final “failure” or “canceled” status are updated if TTL is reached.

apps/webapp/app/v3/handleSocketIo.server.ts (2)

1-38: Smooth import structure and environment-driven configurations.
These initial lines effectively organize essential imports (e.g., EventBusEventArgs, Redis, etc.) and environment-based logic for Socket.IO. Everything appears to be in order, especially the conditional loading of Redis parameters. Just ensure proper fallback behavior when environment variables are missing (e.g., REDIS_HOST, REDIS_PORT) to prevent runtime errors.


632-634: Sanitize or validate room naming if friendly IDs originate externally.
The roomFromFriendlyRunId function simply prefixes “room:” to the run ID without validation. If run IDs are user-provided or untrusted, you might enforce an allowed character set or length limit to avoid collisions or unexpected socket behavior.

apps/webapp/app/v3/services/batchTriggerV3.server.ts (2)

1-38: Strategic use of constants and enumerations.
Defining ASYNC_BATCH_PROCESS_SIZE_THRESHOLD, PROCESSING_BATCH_SIZE, and BatchProcessingStrategy early in the file centralizes configuration and promotes clarity. Validate that concurrency or memory overhead doesn’t spike with large batch sizes.


145-315: Mix of synchronous and asynchronous batch handling.
The #createAndProcessBatchTaskRun method chooses between immediate (inline) processing and asynchronous queueing depending on batch size. This design is good for performance, but watch for potential edge cases if the inline path partially fails before transitioning to the job queue. Logging partial progress and partial failures is crucial.

apps/webapp/app/services/routeBuilders/apiBuilder.server.ts (3)

20-23: Worker authentication imports appear consistent.
These new imports from workerGroupTokenService.server provide worker authentication capabilities. The usage pattern is consistent with other authentication approaches in this file.


25-25: Flexible AnyZodSchema is beneficial.
Expanding beyond z.AnyZodObject to AnyZodSchema improves the range of valid schema definitions. Ensure that discriminated unions are tested to avoid unexpected parse failures in complex schemas.


848-1010: createActionWorkerApiRoute: Validate request method and parse body thoroughly.
The action route support for worker tokens is a welcome addition. The JSON parsing flow is robust, but consider limiting the maximum content size or adding a structured approach to partial body validation if the worker requests can be large.

apps/webapp/app/presenters/v3/DevPresenceStream.server.ts (1)

14-15: Clarify Redis integration requirements.

The TODO comments suggest incomplete Redis integration. Please provide more details about:

  1. The Redis client creation strategy
  2. The source of Redis options

Would you like me to help implement the Redis client integration once the requirements are clarified?

apps/webapp/app/routes/api.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.attempts.start.ts (1)

1-32: Well-structured API route with proper validation!

The implementation is clean and follows best practices:

  • Uses Zod for request validation
  • Properly typed with TypeScript
  • Clear separation of concerns
  • Consistent with API route patterns
apps/webapp/app/routes/api.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.attempts.complete.ts (1)

1-33: Clean implementation with proper response wrapping!

The implementation follows the same patterns as the start route with a proper response structure:

  • Uses Zod for request validation
  • Properly typed with TypeScript
  • Response is correctly wrapped in a result field
apps/webapp/app/routes/api.v1.batches.$batchId.ts (1)

37-37: Verify type safety of the runs field.

Please ensure that batch.runIds is properly typed and matches the expected response type. Consider adding runtime validation if needed.

✅ Verification successful

🏁 Script executed:

#!/bin/bash
# Description: Check the type definition of batch.runIds
# Expected: Should find the type definition in the schema or model files

# Search for batch type definitions
rg -A 5 "type.*Batch|interface.*Batch|model.*Batch"

# Search for runIds field specifically
rg "runIds.*:"

Length of output: 25989


Type Safety of the runs Field Is Consistent
The API schema defines the runs field as an array of strings (using z.string().array()), and the database migration also confirms that the underlying runIds column is a TEXT array. In our code (including in BatchTriggerTaskResponse defined in packages/core/src/v3/schemas/api.ts) the batch.runIds is expected and validated as a string[]. No further runtime validation is necessary.

apps/webapp/app/routes/api.v1.dev.config.ts (1)

10-10: Clarify the purpose of returning constant 1.

The findResource function returning a constant 1 seems unusual. Consider adding a comment explaining why this is needed or if it can be replaced with a more meaningful implementation.

apps/webapp/app/routes/api.v1.worker-actions.runs.$runFriendlyId.logs.debug.ts (1)

29-36: LGTM! Well-structured error handling.

The switch statement with exhaustive checking and appropriate HTTP status codes is well implemented. The use of 400 status for preventing retries is a good practice.

apps/webapp/app/v3/runEngine.server.ts (1)

17-22: Validate Redis configuration values.

The Redis configuration uses optional chaining but doesn't validate the values. Missing or invalid Redis configuration could cause runtime issues.

Consider adding validation for the Redis configuration values. Also, verify if TLS should be enabled by default for security:

apps/webapp/app/routes/admin.api.v1.environments.$environmentId.ts (1)

6-6: LGTM! Good abstraction of concurrency limit updates.

The change to use updateEnvConcurrencyLimits is a good architectural decision as it encapsulates the concurrency update logic and makes the code more maintainable.

Also applies to: 64-64

apps/webapp/app/routes/api.v1.deployments.ts (2)

9-9: LGTM! Good addition of ServiceValidationError import.

The import enables proper error type checking and handling.


37-62: Well-structured error handling implementation.

The error handling implementation is robust with:

  • Proper differentiation between validation errors (400) and server errors (500)
  • Good error logging practices
  • Clear error messages in responses
apps/webapp/app/components/runs/v3/RunIcon.tsx (1)

10-10: LGTM! Clean implementation of task-cached icon support.

The implementation follows the existing pattern and maintains consistent styling with other task icons.

Also applies to: 45-46

apps/webapp/app/services/apiRateLimit.server.ts (1)

62-64: LGTM! Clear addition of DEV endpoints to whitelist.

The changes properly document and whitelist the development endpoints for the run engine.

apps/webapp/app/routes/admin.api.v1.orgs.$organizationId.concurrency.ts (1)

6-6: LGTM! Good separation of concerns.

Moving the concurrency limit update logic to a dedicated function improves modularity.

apps/webapp/app/routes/api.v1.tasks.$taskId.trigger.ts (1)

4-4: LGTM! Well-structured changes for engine versioning support.

The changes consistently implement engine versioning across the request-response cycle:

  1. Added RunEngineVersionSchema for type safety
  2. Added header schema validation
  3. Correctly passed to service.call
  4. Added isCached to response

Also applies to: 28-28, 57-57, 83-96, 111-111

apps/webapp/app/routes/api.v1.tasks.batch.ts (1)

59-59: Verify if engineVersion should be passed to service.call.

The engineVersion is extracted from headers but not passed to the service.call method. This might be inconsistent with the single task trigger implementation which does use the engineVersion.

Also applies to: 92-99

✅ Verification successful

🏁 Script executed:

#!/bin/bash
# Description: Check if BatchTriggerV2Service.call accepts engineVersion parameter
ast-grep --pattern 'class BatchTriggerV2Service {
  $$$
  call($$$) {
    $$$
  }
  $$$
}'

Length of output: 92


I'm not able to determine the parameter list for the BatchTriggerV2Service.call method based on the initial ast-grep output. Let's gather more context by locating the BatchTriggerV2Service definition and then inspecting its call method signature.

Below is a shell script to first locate the file containing the BatchTriggerV2Service class, then print the lines around its call method definition:

Once we have the output from this script, we can verify if the call method’s parameters include an engineVersion parameter.


🏁 Script executed:

#!/bin/bash
# Locate the file defining BatchTriggerV2Service and display the call method signature.

FILE=$(rg -n "class BatchTriggerV2Service" -l | head -n 1)
echo "File found: $FILE"
if [ -n "$FILE" ]; then
  echo "Extracting the call method signature (with context):"
  rg -C 10 "call(" "$FILE"
else
  echo "BatchTriggerV2Service class definition not found."
fi

Length of output: 567


Let's get more detailed context. We need to inspect the actual call method declaration and its parameters in the BatchTriggerV2Service. I'll generate a new shell script that outputs the surrounding 20 lines (for better context) from the call method declaration in the file:


🏁 Script executed:

#!/bin/bash
FILE="apps/webapp/app/v3/services/batchTriggerV2.server.ts"
if [ -f "$FILE" ]; then
  echo "Extracting the call method signature (with 20 lines of context) from $FILE:"
  rg -C 20 "call\(" "$FILE"
else
  echo "File $FILE not found."
fi

Length of output: 3425


EngineVersion Not Required for BatchTriggerV2Service.call

After reviewing the BatchTriggerV2Service implementation in
apps/webapp/app/v3/services/batchTriggerV2.server.ts, we confirmed that the call method’s signature only accepts three parameters (environment, body, and an options object) and does not include an engineVersion parameter. Although the API route (apps/webapp/app/routes/api.v1.tasks.batch.ts) extracts the engineVersion header, it does not pass it to the service.call method—unlike the single task trigger implementation where engineVersion is forwarded. This difference appears to be intentional and aligns with the design of the batch trigger service.

apps/webapp/app/v3/models/workerDeployment.server.ts (1)

93-110: Consider using findUnique for better performance.

The function correctly uses findUnique with a composite key which is more performant than findFirst.

apps/webapp/app/v3/authenticatedSocketConnection.server.ts (1)

1-5: LGTM! The heartbeat service implementation looks good.

The 45-second interval for heartbeats is a reasonable choice for maintaining WebSocket connections while avoiding excessive network traffic.

Also applies to: 75-90

apps/webapp/app/v3/services/createDeployedBackgroundWorker.server.ts (1)

40-53: LGTM! Using BackgroundWorkerId.generate() for consistent ID generation.

The implementation ensures consistent and reliable ID generation for background workers.

apps/webapp/app/entry.server.tsx (1)

208-208: LGTM! Proper singleton registration for event bus handlers.

The singleton pattern ensures that the event bus handlers are initialized only once during the application lifecycle.

Also applies to: 219-219

apps/webapp/app/v3/services/createCheckpoint.server.ts (1)

99-113: LGTM! Robust checkpoint creation with proper ID generation.

The implementation uses CheckpointId.generate() for consistent ID generation and includes comprehensive data for checkpoint creation.

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.v3.$projectParam.deployments.$deploymentParam/route.tsx (1)

159-162: LGTM! Good addition of worker type information.

The implementation follows the existing pattern and properly formats the worker type for display.

apps/webapp/app/env.server.ts (3)

94-119: Well-structured Valkey configuration!

Good reuse of Redis configuration with proper fallbacks. The nullish coalescing ensures graceful handling of undefined values.


293-301: Verify the timeout values for Run Engine 2.0.

The timeout values seem reasonable but should be verified against expected task durations and network latencies in production.

Consider:

  1. Making timeouts configurable per environment
  2. Adding monitoring for timeout occurrences
  3. Implementing exponential backoff for retries

303-312: Well-documented development settings.

The comments clearly explain the purpose of each setting. The polling intervals and batch sizes look reasonable for development.

apps/webapp/app/utils/taskEvent.ts (2)

80-80: Good addition of debug flag to span data.

The isDebug flag enhances observability and debugging capabilities.


82-82: Good use of type assertion!

The satisfies operator ensures type safety while allowing the object to maintain its literal type.

apps/webapp/app/v3/registryProxy.server.ts (1)

16-16: LGTM! Clean implementation of Docker image reference handling.

The implementation correctly uses the imported functions from @trigger.dev/core/v3 to parse and rewrite Docker image references. The logic is clear and well-documented with proper error handling and logging.

Also applies to: 51-64

apps/webapp/app/v3/services/createBackgroundWorker.server.ts (2)

19-23: LGTM! Clean import organization.

The imports are well-organized and clearly indicate the purpose of each imported function.


131-131: LGTM! Improved concurrency management.

The direct calls to concurrency management functions improve code clarity and maintainability by removing optional chaining and centralizing the concurrency logic.

Also applies to: 233-234, 244-245

apps/webapp/app/services/worker.server.ts (1)

59-62: LGTM! Clean import organization for batch processing.

The imports clearly separate V2 and V3 batch processing types and services.

.github/workflows/pr_checks.yml (1)

57-59: Add Prisma Client Generation Step

The new step "📀 Generate Prisma Client" is clearly added after dependency installation and before the build step. This helps ensure that the Prisma Client is always generated during the preview release flow. Please verify that the command (pnpm run generate) correctly triggers Prisma Client generation in all intended environments.

.vscode/launch.json (1)

119-126: New Debug Configuration for RunEngine Tests

The addition of the "Debug RunEngine tests" configuration enhances your debugging capabilities for the RunEngine component. The configuration is consistent with your other node-terminal setups. Consider, if needed for your testing environment, adding an envFile configuration to provide any necessary environment variables.

apps/webapp/app/v3/services/batchTriggerV2.server.ts (3)

52-54: LGTM! Helpful class documentation.

The comment clearly indicates this class's purpose for handling larger batches in Run Engine v1.


715-716: Consistent variable naming for friendly IDs.

The variable name change from runIds to runFriendlyIds improves code clarity by being more explicit about the type of ID being used.

Also applies to: 725-727


798-798: Consistent error message with variable naming.

The error message now correctly references runFriendlyId instead of runId, maintaining consistency with the variable name changes.

apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.v3.$projectParam.runs.$runParam.spans.$spanParam/route.tsx (3)

427-429: Enhanced run status display for cached runs.

The UI now clearly indicates when a run is cached by appending " (cached)" to the task identifier, improving user understanding of run status.

Also applies to: 433-436


606-621: Improved idempotency information display.

The idempotency section now shows both the key and its expiration date in a clear, organized manner.


654-657: Added engine version display.

The engine version is now displayed in the run details, providing important version information.

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.v3.$projectParam.runs.electric.$runParam/route.tsx (1)

222-224: Added keyboard shortcut for canceling runs.

The Cancel run button now includes a keyboard shortcut (C), improving accessibility and user experience.

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.v3.$projectParam.runs.$runParam/route.tsx (3)

465-469: Added admin-only debug mode.

The debug mode state is properly initialized and restricted to admin users.


488-488: Efficient debug event filtering.

The filtering logic for debug events is implemented at the tree level, ensuring good performance.


513-521: Well-implemented debug mode toggle.

The debug mode toggle is:

  • Properly restricted to admin users
  • Includes a keyboard shortcut (Shift+D)
  • Has clear labeling
.github/workflows/e2e.yml (1)

41-43: LGTM! The Prisma client generation step is correctly positioned.

The new step is correctly placed after dependency installation and before the build step, ensuring that the Prisma client is available for the build process.

Comment on lines +362 to +363
} else if (batchId.length !== 27 && batchId.length !== 31) {
error = "Batch IDs are 27/32 characters long";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

⚠️ Potential issue

🧩 Analysis chain

Inconsistency in batch ID length validation.

The error message states "27/32 characters" but the validation checks for "27/31 characters". This mismatch could confuse users.

Apply this diff to fix the error message:

-      error = "Batch IDs are 27/32 characters long";
+      error = "Batch IDs are 27/31 characters long";

Additionally, there's an inconsistency with the validation in RunFilters.tsx where batch IDs are still required to be exactly 27 characters long.

Run the following script to verify the batch ID validation across the codebase:


🏁 Script executed:

#!/bin/bash
# Description: Search for batch ID validation logic to ensure consistency.

# Test: Search for batch ID length validation. Expect: Only occurrences of the new validation.
rg -A 5 'batch.*length.*27'

Length of output: 896


Batch ID Validation Inconsistency Found

  • In apps/webapp/app/components/runs/v3/BatchFilters.tsx, the condition checks for batch IDs with lengths 27 or 31, but the error message incorrectly indicates "27/32 characters long." The suggested fix is to update it to "27/31 characters long."
  • Additionally, in apps/webapp/app/components/runs/v3/RunFilters.tsx, the validation is stricter—requiring exactly 27 characters—which introduces an inconsistency between the two components.

Comment on lines +766 to +767
} else if (runId.length !== 25 && runId.length !== 29) {
error = "Run IDs are 25/30 characters long";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Inconsistency in run ID length error message.

The error message states "25/30 characters" but the validation checks for "25/29 characters". This mismatch could confuse users.

Apply this diff to fix the error message:

-      error = "Run IDs are 25/30 characters long";
+      error = "Run IDs are 25/29 characters long";
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} else if (runId.length !== 25 && runId.length !== 29) {
error = "Run IDs are 25/30 characters long";
} else if (runId.length !== 25 && runId.length !== 29) {
error = "Run IDs are 25/29 characters long";

Comment on lines +80 to +86
function updateAvailableResources(
availableResources: MachineResources,
resources: MachineResources
) {
availableResources.cpu -= resources.cpu;
availableResources.memory -= resources.memory;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider clamping resource values in updateAvailableResources.

This function subtracts CPU and memory directly from the available resources. To prevent miscalculations (e.g., going into negative territory), clamp the values at zero when the consumed resources exceed what’s available.

 function updateAvailableResources(
   availableResources: MachineResources,
   resources: MachineResources
 ) {
-  availableResources.cpu -= resources.cpu;
-  availableResources.memory -= resources.memory;
+  availableResources.cpu = Math.max(0, availableResources.cpu - resources.cpu);
+  availableResources.memory = Math.max(0, availableResources.memory - resources.memory);
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function updateAvailableResources(
availableResources: MachineResources,
resources: MachineResources
) {
availableResources.cpu -= resources.cpu;
availableResources.memory -= resources.memory;
}
function updateAvailableResources(
availableResources: MachineResources,
resources: MachineResources
) {
availableResources.cpu = Math.max(0, availableResources.cpu - resources.cpu);
availableResources.memory = Math.max(0, availableResources.memory - resources.memory);
}

Comment on lines +50 to +105
const finalizeService = new FinalizeTaskRunService();
const cancelledTaskRun = await finalizeService.call({
id: taskRun.id,
status: "CANCELED",
completedAt: opts.cancelledAt,
include: {
attempts: {
where: {
status: {
in: CANCELLABLE_ATTEMPT_STATUSES,
},
},
include: {
backgroundWorker: true,
dependencies: {
include: {
taskRun: true,
},
},
batchTaskRunItems: {
include: {
taskRun: true,
},
},
},
},
runtimeEnvironment: true,
lockedToVersion: true,
},
attemptStatus: "CANCELED",
error: {
type: "STRING_ERROR",
raw: opts.reason,
},
});

const inProgressEvents = await eventRepository.queryIncompleteEvents({
runId: taskRun.friendlyId,
});

logger.debug("Cancelling in-progress events", {
inProgressEvents: inProgressEvents.map((event) => event.id),
});

await Promise.all(
inProgressEvents.map((event) => {
return eventRepository.cancelEvent(event, opts.cancelledAt, opts.reason);
})
);

// Cancel any in progress attempts
if (opts.cancelAttempts) {
await this.#cancelPotentiallyRunningAttempts(cancelledTaskRun, cancelledTaskRun.attempts);
await this.#cancelRemainingRunWorkers(cancelledTaskRun);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Validate concurrency edge cases for finalizing runs.

When calling "FinalizeTaskRunService" to mark the task run as canceled, ensure concurrent requests don’t overwrite the updated status before or after "finalizeService.call" finishes. If multiple components call “cancel” simultaneously, record-based locking or transaction-level consistency may be needed to prevent race conditions.
[final]

Comment on lines +20 to +51
export function registerRunEngineEventBusHandlers() {
engine.eventBus.on("runSucceeded", async ({ time, run }) => {
try {
const completedEvent = await eventRepository.completeEvent(run.spanId, {
endTime: time,
attributes: {
isError: false,
output:
run.outputType === "application/store" || run.outputType === "text/plain"
? run.output
: run.output
? (safeJsonParse(run.output) as Attributes)
: undefined,
outputType: run.outputType,
},
});

if (!completedEvent) {
logger.error("[runSucceeded] Failed to complete event for unknown reason", {
runId: run.id,
spanId: run.spanId,
});
return;
}
} catch (error) {
logger.error("[runSucceeded] Failed to complete event", {
error: error instanceof Error ? error.message : error,
runId: run.id,
spanId: run.spanId,
});
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Wrap critical event completions in a transaction.

When handling the “runSucceeded” event, multiple I/O calls can fail halfway. Wrap steps in a transaction or handle partial failures more robustly to keep event states consistent.

@@ -97,7 +98,7 @@ export async function action({ request, params }: ActionFunctionArgs) {
},
});

await marqs?.updateEnvConcurrencyLimits({ ...modifiedEnvironment, organization });
await updateEnvConcurrencyLimits({ ...modifiedEnvironment, organization });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding error handling for updateEnvConcurrencyLimits.

The function call should be wrapped in a try-catch block to handle potential errors gracefully.

-    await updateEnvConcurrencyLimits({ ...modifiedEnvironment, organization });
+    try {
+      await updateEnvConcurrencyLimits({ ...modifiedEnvironment, organization });
+    } catch (error) {
+      console.error("Failed to update environment concurrency limits:", error);
+      return json(
+        { error: "Failed to update environment concurrency limits" },
+        { status: 500 }
+      );
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
await updateEnvConcurrencyLimits({ ...modifiedEnvironment, organization });
try {
await updateEnvConcurrencyLimits({ ...modifiedEnvironment, organization });
} catch (error) {
console.error("Failed to update environment concurrency limits:", error);
return json(
{ error: "Failed to update environment concurrency limits" },
{ status: 500 }
);
}

Comment on lines +37 to +47
if (props.shortcut) {
useShortcutKeys({
shortcut: props.shortcut,
action: () => {
if (innerRef.current) {
innerRef.current.click();
}
},
disabled: props.disabled,
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Move shortcut action outside render function and avoid conditional hook.

The current implementation has two issues:

  1. The shortcut action is defined inside the render function, which could cause unnecessary re-renders
  2. The useShortcutKeys hook is called conditionally, which violates React hooks rules

Apply this diff to fix both issues:

+  const handleShortcutAction = React.useCallback(() => {
+    if (innerRef.current) {
+      innerRef.current.click();
+    }
+  }, []);
+
+  useShortcutKeys({
+    shortcut: props.shortcut,
+    action: handleShortcutAction,
+    disabled: props.disabled,
+    enabled: !!props.shortcut,
+  });
-    if (props.shortcut) {
-      useShortcutKeys({
-        shortcut: props.shortcut,
-        action: () => {
-          if (innerRef.current) {
-            innerRef.current.click();
-          }
-        },
-        disabled: props.disabled,
-      });
-    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (props.shortcut) {
useShortcutKeys({
shortcut: props.shortcut,
action: () => {
if (innerRef.current) {
innerRef.current.click();
}
},
disabled: props.disabled,
});
}
const handleShortcutAction = React.useCallback(() => {
if (innerRef.current) {
innerRef.current.click();
}
}, []);
useShortcutKeys({
shortcut: props.shortcut,
action: handleShortcutAction,
disabled: props.disabled,
enabled: !!props.shortcut,
});

Comment on lines +141 to +152
export async function findCurrentUnmanagedWorkerFromEnvironment(
environment: Pick<AuthenticatedEnvironment, "id" | "type">
): Promise<Pick<
BackgroundWorker,
"id" | "friendlyId" | "version" | "sdkVersion" | "cliVersion" | "supportsLazyAttempts"
> | null> {
if (environment.type === "DEVELOPMENT") {
return null;
}

return await findCurrentWorkerFromEnvironment(environment, CURRENT_UNMANAGED_DEPLOYMENT_LABEL);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🛠️ Refactor suggestion

🧩 Analysis chain

Verify return type consistency with findCurrentWorkerFromEnvironment.

The return type of findCurrentUnmanagedWorkerFromEnvironment is missing the engine field that was added to findCurrentWorkerFromEnvironment. This might lead to type errors when using these functions interchangeably.

 export async function findCurrentUnmanagedWorkerFromEnvironment(
   environment: Pick<AuthenticatedEnvironment, "id" | "type">
 ): Promise<Pick<
   BackgroundWorker,
-  "id" | "friendlyId" | "version" | "sdkVersion" | "cliVersion" | "supportsLazyAttempts"
+  "id" | "friendlyId" | "version" | "sdkVersion" | "cliVersion" | "supportsLazyAttempts" | "engine"
 > | null> {

🏁 Script executed:

#!/bin/bash
# Search for the definition of `findCurrentWorkerFromEnvironment` and display its context
rg -n "export async function findCurrentWorkerFromEnvironment" -A 50 .

Length of output: 4567


Inconsistent Return Type: Missing "engine" Field in findCurrentUnmanagedWorkerFromEnvironment

The function findCurrentWorkerFromEnvironment returns an object that includes the "engine" field, whereas findCurrentUnmanagedWorkerFromEnvironment is declared without it. This discrepancy can potentially lead to type errors if these functions are used interchangeably.

  • Verify that both functions maintain the same return type by including the "engine" field in the return type of findCurrentUnmanagedWorkerFromEnvironment.

Comment on lines +55 to +65
//upgrade the project to engine "V2" if it's not already
if (environment.project.engine === "V1" && body.engine === "V2") {
await this._prisma.project.update({
where: {
id: environment.project.id,
},
data: {
engine: "V2",
},
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider making the engine upgrade atomic.

The engine upgrade from V1 to V2 should be atomic to prevent race conditions in concurrent deployments.

-      if (environment.project.engine === "V1" && body.engine === "V2") {
-        await this._prisma.project.update({
-          where: {
-            id: environment.project.id,
-          },
-          data: {
-            engine: "V2",
-          },
-        });
-      }
+      if (environment.project.engine === "V1" && body.engine === "V2") {
+        await this._prisma.project.update({
+          where: {
+            id: environment.project.id,
+            engine: "V1", // Add condition to ensure atomicity
+          },
+          data: {
+            engine: "V2",
+          },
+        });
+      }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
//upgrade the project to engine "V2" if it's not already
if (environment.project.engine === "V1" && body.engine === "V2") {
await this._prisma.project.update({
where: {
id: environment.project.id,
},
data: {
engine: "V2",
},
});
}
//upgrade the project to engine "V2" if it's not already
if (environment.project.engine === "V1" && body.engine === "V2") {
await this._prisma.project.update({
where: {
id: environment.project.id,
engine: "V1", // Add condition to ensure atomicity
},
data: {
engine: "V2",
},
});
}

Comment on lines +85 to +95
//upgrade the project to engine "V2" if it's not already
if (project.engine === "V1" && body.engine === "V2") {
await this._prisma.project.update({
where: {
id: project.id,
},
data: {
engine: "V2",
},
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding a rollback mechanism for engine upgrades.

The engine upgrade logic from "V1" to "V2" lacks a rollback mechanism in case of failures during the upgrade process. This could lead to inconsistencies if the upgrade fails after updating the project engine.

Consider wrapping the upgrade in a transaction:

-      if (project.engine === "V1" && body.engine === "V2") {
-        await this._prisma.project.update({
-          where: {
-            id: project.id,
-          },
-          data: {
-            engine: "V2",
-          },
-        });
-      }
+      if (project.engine === "V1" && body.engine === "V2") {
+        await this._prisma.$transaction(async (tx) => {
+          await tx.project.update({
+            where: {
+              id: project.id,
+            },
+            data: {
+              engine: "V2",
+            },
+          });
+          // Add any additional upgrade steps here
+        });
+      }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants