Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-hosted runs not completing #1668

Open
rharkor opened this issue Feb 5, 2025 · 8 comments
Open

Self-hosted runs not completing #1668

rharkor opened this issue Feb 5, 2025 · 8 comments

Comments

@rharkor
Copy link

rharkor commented Feb 5, 2025

Provide environment information

System:
OS: Linux 6.8 Ubuntu 24.04.1 LTS 24.04.1 LTS (Noble Numbat)
CPU: (8) x64 Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz
Memory: 14.35 GB / 31.29 GB
Container: Yes
Shell: 5.2.21 - /bin/bash
Binaries:
Node: 18.19.1 - /usr/bin/node
npm: 9.2.0 - /usr/bin/npm

Describe the bug

Some task are stuck indefinitely in waiting for no reasons. But it's random because it wont always be stuck

Image

Image

Image

Reproduction repo

No idea

To reproduce

No idea on how to reproduce but maybe I am using something wrong, there's my code:

import { prisma } from "@dimension/core-lib/src/prisma"
import { Shop, ShopCookie } from "@dimension/database-main"
import { tiktokMessages } from "@dimension/tiktok-messages"
import { transformCookie } from "@dimension/tiktok-support-messages"
import { logger, schedules, task } from "@trigger.dev/sdk/v3"

import { handleError, handleMaxDuration, maxDurationPending } from "../lib/error"
import { getActiveShops, wrapCronJob } from "../lib/utils"

const isEnabled = true
const maxDurationWarning = 1000 * 60 * 20 // 20 minutes
const name = "Process shops messages campaigns"
const cron = "0 * * * *" // Every hour

export const processShopMessagesCampaigns = task({
  id: "process-shop-messages-campaigns",
  run: async ({ shop }: { shop: Shop & { cookie: ShopCookie | null } }) => {
    /* Some DB operations */
})

export const processShopsMessagesCampaigns = schedules.task({
  id: "process-shops-messages-campaigns",
  cron: isEnabled ? cron : undefined,
  run: async () => {
    const now = new Date()

    const main = async () => {
      const shops = await getActiveShops(true)
      if (!shops.length) {
        logger.log("No shops found")
        return
      }

      await processShopMessagesCampaigns.batchTriggerAndWait(
        shops.map((shop) => ({
          payload: { shop },
          options: {
            tags: [shop.slug],
          },
        }))
      )
    }

    const checkDuration = maxDurationPending(name, maxDurationWarning)

    await main()
  },
})

Additional information

No response

@rharkor
Copy link
Author

rharkor commented Feb 6, 2025

This is the only problem I am encountering but it is very problematic since I made a policy of non overlapping crons this is blocking the whole process for the new crons

@matt-aitken matt-aitken changed the title bug: Self-hosted runs not completing Feb 13, 2025
@matt-aitken
Copy link
Member

This is a self-hosted deployment

@murshudov
Copy link

For me worker container can't connect to coordinator using websocket, because of this run gets stuck. @rharkor could you please share your compose file?

@murshudov
Copy link

services:
  trigger:
    image: ghcr.io/triggerdotdev/trigger.dev:v3
    environment:
      REMIX_APP_PORT: 3000
      NODE_ENV: production
      RUNTIME_PLATFORM: docker-compose
      V3_ENABLED: true
      TRIGGER_TELEMETRY_DISABLED: 1
      INTERNAL_OTEL_TRACE_DISABLED: 1
      INTERNAL_OTEL_TRACE_LOGGING_ENABLED: 0
      POSTGRES_USER: $POSTGRES_USER
      POSTGRES_PASSWORD: $POSTGRES_PASSWORD
      POSTGRES_DB: $POSTGRES_DB
      MAGIC_LINK_SECRET: $MAGIC_LINK_SECRET
      SESSION_SECRET: $SESSION_SECRET
      ENCRYPTION_KEY: $ENCRYPTION_KEY
      PROVIDER_SECRET: $PROVIDER_SECRET
      COORDINATOR_SECRET: $COORDINATOR_SECRET
      DATABASE_URL: 'postgres://$POSTGRES_USER:$POSTGRES_PASSWORD@postgresql:5432/$POSTGRES_DB?sslmode=disable'
      DIRECT_URL: 'postgres://$POSTGRES_USER:$POSTGRES_PASSWORD@postgresql:5432/$POSTGRES_DB?sslmode=disable'
      REDIS_HOST: redis
      REDIS_PORT: 6379
      REDIS_TLS_DISABLED: true
      COORDINATOR_HOST: 127.0.0.1
      COORDINATOR_PORT: 9020
      WHITELISTED_EMAILS: ''
      ADMIN_EMAILS: $ADMIN_EMAILS
      DEFAULT_ORG_EXECUTION_CONCURRENCY_LIMIT: 300
      DEFAULT_ENV_EXECUTION_CONCURRENCY_LIMIT: 100
      DEPLOY_REGISTRY_HOST: $DEPLOY_REGISTRY_HOST
      DEPLOY_REGISTRY_NAMESPACE: $DEPLOY_REGISTRY_NAMESPACE
      REGISTRY_HOST: $DEPLOY_REGISTRY_HOST
      REGISTRY_NAMESPACE: $DEPLOY_REGISTRY_NAMESPACE
      EMAIL_TRANSPORT: $EMAIL_TRANSPORT
      FROM_EMAIL: $FROM_EMAIL
      REPLY_TO_EMAIL: $REPLY_TO_EMAIL
      SMTP_HOST: $SMTP_HOST
      SMTP_PORT: $SMTP_PORT
      SMTP_SECURE: $SMTP_SECURE
      SMTP_USER: $SMTP_USER
      SMTP_PASSWORD: $SMTP_PASSWORD
      LOGIN_ORIGIN: ${SERVICE_FQDN_TRIGGER}
      APP_ORIGIN: ${SERVICE_FQDN_TRIGGER}
      DEV_OTEL_EXPORTER_OTLP_ENDPOINT: '$SERVICE_FQDN_TRIGGER/otel'
      ELECTRIC_ORIGIN: 'http://electric:3000'
    networks:
      - trigger
    depends_on:
      postgresql:
        condition: service_healthy
      redis:
        condition: service_healthy
      electric:
        condition: service_healthy
    healthcheck:
      test: "timeout 10s bash -c ':> /dev/tcp/127.0.0.1/3000' || exit 1"
      interval: 10s
      timeout: 5s
      retries: 5

  docker-provider:
    image: ghcr.io/triggerdotdev/provider/docker:v3
    platform: linux/amd64
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    user: root
    networks:
      - trigger
    depends_on:
      trigger:
        condition: service_healthy
    environment:
      HTTP_SERVER_PORT: 9020
      PLATFORM_HOST: trigger
      PLATFORM_WS_PORT: 3000
      PLATFORM_SECRET: $PROVIDER_SECRET
      SECURE_CONNECTION: false
      COORDINATOR_HOST: 127.0.0.1
      COORDINATOR_PORT: 9020
      DOCKER_NETWORK: trigger
      REGISTRY_HOST: $DEPLOY_REGISTRY_HOST
      REGISTRY_NAMESPACE: $DEPLOY_REGISTRY_NAMESPACE
      FORCE_CHECKPOINT_SIMULATION: 0
      ENFORCE_MACHINE_PRESETS: true
      OTEL_EXPORTER_OTLP_ENDPOINT: '$SERVICE_FQDN_TRIGGER/otel'
    healthcheck:
      test:
        - CMD
        - node
        - '-e'
        - "require('http').get('http://127.0.0.1:9020/health', (r) => {if (r.statusCode !== 200) process.exit(1); else process.exit(0); }).on('error', () => process.exit(1))"
      interval: 5s

  coordinator:
    image: ghcr.io/triggerdotdev/coordinator:v3
    platform: linux/amd64
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    user: root
    networks:
      - trigger
    ports:
      - '127.0.0.1:9020:9020'
    depends_on:
      trigger:
        condition: service_healthy
    environment:
      HTTP_SERVER_PORT: 9020
      PLATFORM_HOST: trigger
      PLATFORM_WS_PORT: 3000
      PLATFORM_SECRET: $PROVIDER_SECRET
      SECURE_CONNECTION: false
      COORDINATOR_HOST: 127.0.0.1
      COORDINATOR_PORT: 9020
      REGISTRY_HOST: $DEPLOY_REGISTRY_HOST
      REGISTRY_NAMESPACE: $DEPLOY_REGISTRY_NAMESPACE
      FORCE_CHECKPOINT_SIMULATION: 0
      OTEL_EXPORTER_OTLP_ENDPOINT: '$SERVICE_FQDN_TRIGGER/otel'
    healthcheck:
      test:
        - CMD
        - node
        - '-e'
        - "require('http').get('http://127.0.0.1:9020/health', (r) => {if (r.statusCode !== 200) process.exit(1); else process.exit(0); }).on('error', () => process.exit(1))"
      interval: 5s

  electric:
    image: electricsql/electric:latest
    environment:
      DATABASE_URL: 'postgres://$POSTGRES_USER:$POSTGRES_PASSWORD@postgresql:5432/$POSTGRES_DB?sslmode=disable'
    networks:
      - trigger
    depends_on:
      postgresql:
        condition: service_healthy
    healthcheck:
      test: 'curl --fail http://127.0.0.1:3000/v1/health || exit 1'
      interval: 10s
      retries: 5
      start_period: 10s
      timeout: 10s

  redis:
    image: redis:7
    networks:
      - trigger
    healthcheck:
      test:
        - CMD-SHELL
        - 'redis-cli ping | grep PONG'
      interval: 1s
      timeout: 3s
      retries: 5
    volumes:
      - redis-data:/data

  postgresql:
    image: postgres:16-alpine
    volumes:
      - postgresql-data:/var/lib/postgresql/data/
    networks:
      - trigger
    environment:
      POSTGRES_USER: $POSTGRES_USER
      POSTGRES_PASSWORD: $POSTGRES_PASSWORD
      POSTGRES_DB: $POSTGRES_DB
    command:
      - -c
      - wal_level=logical
    healthcheck:
      test:
        - CMD-SHELL
        - 'pg_isready -U $${POSTGRES_USER} -d $${POSTGRES_DB}'
      interval: 5s
      timeout: 20s
      retries: 10

volumes:
  postgresql-data:
  redis-data:

networks:
  trigger:
    name: trigger
    external: true

@rharkor

This comment has been minimized.

@rharkor
Copy link
Author

rharkor commented Feb 13, 2025

because of this run gets s

I am using the app and worker separately, so I don't really know which one you want, also to mention that everything works fin 99.5% of the time.

@murshudov
Copy link

because of this run gets s

I am using the app and worker separately, so I don't really know which one you want, also to mention that everything works fin 99.5% of the time.

I am running them on the same server, but would be great to see any working configuration. Pulling my hair for the last 3 days)

@rharkor
Copy link
Author

rharkor commented Feb 13, 2025

because of this run gets s

I am using the app and worker separately, so I don't really know which one you want, also to mention that everything works fin 99.5% of the time.

I am running them on the same server, but would be great to see any working configuration. Pulling my hair for the last 3 days)

Okay so this is my config:

docker-compose.webapp.yml

x-env: &webapp-env
  LOGIN_ORIGIN: https://${TRIGGER_DOMAIN:?Please set this in your .env file}
  APP_ORIGIN: https://${TRIGGER_DOMAIN}
  DEV_OTEL_EXPORTER_OTLP_ENDPOINT: https://${TRIGGER_DOMAIN}/otel
  ELECTRIC_ORIGIN: http://electric:3000

volumes:
  postgres-data:
  redis-data:

networks:
  default:

services:
  webapp:
    image: ghcr.io/triggerdotdev/trigger.dev:${TRIGGER_IMAGE_TAG:-v3}
    restart: ${RESTART_POLICY:-unless-stopped}
    env_file:
      - .env
    environment:
      <<: *webapp-env
    ports:
      - ${WEBAPP_PUBLISH_IP:-127.0.0.1}:3040:3030
    depends_on:
      - postgres
      - redis
    networks:
      - default

  postgres:
    image: postgres:${POSTGRES_IMAGE_TAG:-16}
    restart: ${RESTART_POLICY:-unless-stopped}
    volumes:
      - postgres-data:/var/lib/postgresql/data/
    env_file:
      - .env
    networks:
      - default
    ports:
      - ${DOCKER_PUBLISH_IP:-127.0.0.1}:5433:5432
    command:
      - -c
      - wal_level=logical

  redis:
    image: redis:${REDIS_IMAGE_TAG:-7}
    restart: ${RESTART_POLICY:-unless-stopped}
    volumes:
      - redis-data:/data
    networks:
      - default
    ports:
      - ${DOCKER_PUBLISH_IP:-127.0.0.1}:6389:6379

  electric:
    image: electricsql/electric:${ELECTRIC_IMAGE_TAG:-latest}
    restart: ${RESTART_POLICY:-unless-stopped}
    environment:
      DATABASE_URL: $DATABASE_URL
    networks:
      - default
    depends_on:
      - postgres
    ports:
      - ${DOCKER_PUBLISH_IP:-127.0.0.1}:3061:3000

docker-compoe.worker.yml

x-env: &worker-env
  PLATFORM_HOST: ${TRIGGER_DOMAIN:?Please set this in your .env file}
  PLATFORM_WS_PORT: 443
  SECURE_CONNECTION: "true"
  OTEL_EXPORTER_OTLP_ENDPOINT: https://${TRIGGER_DOMAIN}/otel

networks:
  default:

services:
  docker-provider:
    image: ghcr.io/triggerdotdev/provider/docker:${TRIGGER_IMAGE_TAG:-v3}
    restart: ${RESTART_POLICY:-unless-stopped}
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    user: root
    networks:
      - default
    ports:
      - ${DOCKER_PUBLISH_IP:-127.0.0.1}:9021:9020
    env_file:
      - .env
    environment:
      <<: *worker-env
      PLATFORM_SECRET: $PROVIDER_SECRET

  coordinator:
    image: ghcr.io/triggerdotdev/coordinator:${TRIGGER_IMAGE_TAG:-v3}
    restart: ${RESTART_POLICY:-unless-stopped}
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    user: root
    networks:
      - default
    ports:
      - ${DOCKER_PUBLISH_IP:-127.0.0.1}:9020:9020
    env_file:
      - .env
    environment:
      <<: *worker-env
      PLATFORM_SECRET: $COORDINATOR_SECRET

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants