Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

package job fails with error: "CPU too busy" #151

Open
insanity54 opened this issue Jan 4, 2025 · 4 comments
Open

package job fails with error: "CPU too busy" #151

insanity54 opened this issue Jan 4, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@insanity54
Copy link

insanity54 commented Jan 4, 2025

Describe the bug
When running a pipeline job, transcode succeeds, then package fails with an error. "CPU too busy"

To Reproduce

POST /pipeline with following input

{
  "inputs": [
    {
      "type": "video",
      "path": "s3://2024-06-04.mp4"
    },
    {
      "type": "audio",
      "path": "s3://2024-06-04.mp4",
      "language": "eng"
    }
  ],
  "streams": [
    {
      "type": "video",
      "codec": "h264",
      "height": 360
    },
    {
      "type": "video",
      "codec": "h264",
      "height": 144
    },
    {
      "type": "audio",
      "codec": "aac"
    }
  ],
  "tag": "test"
}

I re-tried 3 times and got other errors too.

On the second try, I got an error during the ffmpeg(video,360) Job. Unable to connect. Is the computer able to access the url? followed by an UnknownError on ffmpeg(video,144). These two errors coincided with my (Starlink) internet cutting out as it does a few times a day, so I think those were unrelated.

On the third try, I got another error.

ffprobe exited with code 1
ffprobe version 6.0-static https://johnvansickle.com/ffmpeg/
Copyright (c) 2007-2023 the FFmpeg developers
built with gcc 8 (Debian 8.3.0-6)
configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-libgme --enable-gray --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libdav1d --enable-libxvid --enable-libzvbi --enable-libzimg
libavutil      58.  2.100 / 58.  2.100
libavcodec     60.  3.100 / 60.  3.100
libavformat    60.  3.100 / 60.  3.100
libavdevice    60.  1.100 / 60.  1.100
libavfilter     9.  3.100 /  9.  3.100
libswscale      7.  1.100 /  7.  1.100
libswresample   4. 10.100 /  4. 10.100
libpostproc    57.  1.100 / 57.  1.100

[tcp @ 0x17f9e3c0] Failed to resolve hostname fp-dev.s3.us-west-000.backblazeb2.com: Temporary failure in name resolution
https://fp-dev.s3.us-west-000.backblazeb2.com/2024-06-04.mp4?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=***REDACTED***&X-Amz-Date=20250104T090919Z&X-Amz-Expires=86400&X-Amz-Signature=***REDACTED***&X-Amz-SignedHeaders=host&x-id=GetObject: Input/output error

This one was again during a temporary internet outage, so I think it's unrelated too.

Fourth try, this time using POST /package instead of POST /pipeline, another CPU too busy error during package Job.

Fifth try using POST /pipeline, transcode was successful, but package errored. New error. no tomes available. This is a backblaze error I've seen before. Apparently it happens more commonly at night when people are using B2 to backup their infrastructure.

This leads me to think, could the CPU too busy also be an error from Backblaze? I did a search for that string in superstreamer's codebase and I couldn't find a match, so at this point I think that could be a plausible explanation.

I'll try again during the daytime to see if I can narrow down the problem.

Expected behavior

I was hoping to see the package Job succeed. I think it's ironic that the transcode Job completes successfully because I would think transcoding would be the more CPU intensive task. But then again, software is complex and doesn't always do what we want 😆

Screenshots

Screenshot 2025-01-04 at 01-29-34 Superstreamer

Desktop (please complete the following information):

  • OS: Manjaro Linux x86_64
  • Browser: Firefox 133.0.3 (64-bit)
  • Version: alpha 2025-01-04 07:37
  • load average: 2.65, 2.61, 2.84
  • Host: Z97X-Gaming 5
  • Kernel: 6.11.11-1-MANJARO
  • CPU: Intel i7-4790 (8) @ 4.000GHz
  • GPU: Intel HD Graphics
  • Memory: 13441MiB / 23892MiB

Smartphone (please complete the following information):
n/a

Additional context

Logs

    1
    Synced folder in /tmp/superstreamer-cf7f98e4-65bd-4fd9-b6e5-8aed44efed4dXyRDwh

2
Got meta: "{"version":1,"streams":{"audio_eng_128000_aac.m4a":{"type":"audio","codec":"aac","language":"eng","bitrate":128000,"channels":2},"video_144_108000_h264.m4v":{"type":"video","codec":"h264","height":144,"bitrate":108000,"framerate":60}},"segmentSize":2.24}"
3
in=/tmp/superstreamer-cf7f98e4-65bd-4fd9-b6e5-8aed44efed4dXyRDwh/audio_eng_128000_aac.m4a,stream=audio,init_segment=audio_eng_128000_aac/init.mp4,segment_template=audio_eng_128000_aac/$Number$.m4a,playlist_name=audio_eng_128000_aac/playlist.m3u8,hls_group_id=audio_aac,hls_name=eng_aac,language=eng in=/tmp/superstreamer-cf7f98e4-65bd-4fd9-b6e5-8aed44efed4dXyRDwh/video_144_108000_h264.m4v,stream=video,init_segment=video_144_108000_h264/init.mp4,segment_template=video_144_108000_h264/$Number$.m4s,playlist_name=video_144_108000_h264/playlist.m3u8,iframe_playlist_name=video_144_108000_h264/iframe.m3u8 --segment_duration 2.24 --fragment_duration 2.24 --hls_master_playlist_output master.m3u8
4
Uploading to package/a5b8fb72-e9b9-41c3-bd06-1e1dace663f1/hls

The last artisan log before the Job failed

{
  clientName: "S3Client",
  commandName: "PutObjectCommand",
  input: {
    Bucket: "fp-dev",
    Key: "package/a5b8fb72-e9b9-41c3-bd06-1e1dace663f1/hls/video_144_108000_h264/34.m4s",
    Body: ReadStream {
      fd: null,
      bytesRead: 30112,
      [Symbol(mfileSize)]: -1,
      [Symbol(Bun.NodeReadStream)]: true,
      [Symbol(Bun.NodeReadStreamPathOrFd)]: "/tmp/superstreamer-32ef6c48-46fd-48e5-aa4d-bb36c5ecd82dzeCHVh/video_144_108000_h264/34.m4s",
      file: "/tmp/superstreamer-32ef6c48-46fd-48e5-aa4d-bb36c5ecd82dzeCHVh/video_144_108000_h264/34.m4s",
      path: "/tmp/superstreamer-32ef6c48-46fd-48e5-aa4d-bb36c5ecd82dzeCHVh/video_144_108000_h264/34.m4s",
      [Symbol(refCount)]: 0,
      [Symbol(constructed)]: true,
      [Symbol(remainingChunk)]: Buffer(65536) [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 65023 more ],
      [Symbol(pendingRead)]: false,
      [Symbol(hasResized)]: true,
      _readableState: [Object ...],
      _events: [Object ...],
      _eventsCount: 2,
      _maxListeners: undefined,
      [Symbol(kCapture)]: false,
      [Symbol(highWaterMark)]: 16384,
      [Symbol(kHandle)]: null,
      end: Infinity,
      _read: [Function],
      start: 0,
      flags: "r",
      mode: 438,
      emitClose: true,
      [Symbol(Bun.Node.readStreamPathFastPath)]: false,
      pos: 30112,
      [Symbol(kFs)]: [Object ...],
      [Symbol(kIoDone)]: true,
      [Symbol(kAutoDestroyed)]: true,
      _construct: [Function],
      _destroy: [Function],
      close: [Function],
      push: [Function],
      [Symbol(kinternalRead)]: [Function],
      [Symbol(kerrorOrDestroy)]: [Function],
      pause: [Function],
      resume: [Function],
      unshift: [Function],
      pipe: [Function],
      [Symbol(_onClose)]: [Function],
      [Symbol(_onDrain)]: [Function],
      [Symbol(_internalConstruct)]: [Function],
      [Symbol(_getRemainingChunk)]: [Function],
      [Symbol(_adjustHighWaterMark)]: [Function],
      [Symbol(_handleResult)]: [Function],
      [Symbol(_internalRead)]: [Function],
      ref: [Function],
      unref: [Function],
      [Symbol(kEnsureConstructed)]: [Function],
      on: [Function],
      destroy: [Function: destroy2],
      _undestroy: [Function: undestroy],
      isPaused: [Function],
      setEncoding: [Function],
      read: [Function],
      unpipe: [Function],
      addListener: [Function],
      removeListener: [Function],
      off: [Function],
      removeAllListeners: [Function],
      wrap: [Function],
      iterator: [Function],
      readable: [Getter/Setter],
      readableDidRead: [Getter],
      readableAborted: [Getter],
      readableHighWaterMark: [Getter],
      readableBuffer: [Getter],
      readableFlowing: [Getter/Setter],
      readableLength: [Getter],
      readableObjectMode: [Getter],
      readableEncoding: [Getter],
      errored: [Getter],
      closed: [Getter],
      destroyed: [Getter/Setter],
      readableEnded: [Getter],
      asIndexedPairs: [Function],
      drop: [Function],
      filter: [Function],
      flatMap: [Function],
      map: [Function],
      take: [Function],
      every: [Function],
      forEach: [Function],
      reduce: [Function],
      toArray: [Function],
      some: [Function],
      find: [Function],
      [Symbol(nodejs.rejection)]: [Function],
      [Symbol(Symbol.asyncIterator)]: [Function],
    },
    ContentLength: 30112,
    ContentType: "video/iso.segment",
  },
  output: {
    ETag: "\"aa463ea7db978aeb4d1992f392ab8af8\"",
    VersionId: "4_z8d3307ffe1232a9983d30b18_f114cb149baa5217b_d20250104_m092847_c000_v0001074_t0034_u01735982927874",
  },
  metadata: {
    httpStatusCode: 200,
    requestId: "004fc39f93c021ed",
    extendedRequestId: "aZLQzLjchZjAxDTPTYds5KzP1M5hi9jjq",
    cfId: undefined,
    attempts: 1,
    totalRetryDelay: 0,
  },
}
@matvp91
Copy link
Collaborator

matvp91 commented Jan 4, 2025

If the packager would fail, we'd never see the "Uploading to" log, packaging did succeed locally.

I suspect it's B2 throwing that error and has nothing to do with your local CPU. Found something alike on https://xcp-ng.org/forum/topic/5397/delta-backups-failing-aws-s3-uploadpartcopy-cpu-too-busy

Syncing from and to S3 is handled by https://www.npmjs.com/package/s3-sync-client, I've been thinking of writing our own syncing logic (on top of S3 JS SDK) that properly handles retries / is more robust. It'll take more time but I'll see if I can write a quick proof of concept (tomorrow evening perhaps), atleast we'd be able to test our hypothesis whether it's related to B2.

If you can build from source, feel free to set maxConcurrentTransfers to 1 in the package job handler (it's a setting from the s3-sync-client package). Thinking CPU gets clogged on B2 due to "too many concurrent" uploads, wild guess though.

@matvp91 matvp91 added the bug Something isn't working label Jan 5, 2025
@matvp91
Copy link
Collaborator

matvp91 commented Jan 6, 2025

@insanity54, I've pushed a commit (9e868ac) where you can specify public and concurrency in the package job. I suggest the following:

  • Use the /transcode endpoint to start transcode, do not use the pipeline endpoint (for now). This allows us to test this quickly without going through a full (re-)transcode.
  • Use the /package endpoint but manually specify the following properties:
{
  assetId: "...", // Your asset id returned from the transcode job.
  public: false, // This is an opt-in, for now, I default it to "true" for backwards compatibility.
  concurrency: 1, // Instruct the package job to upload each segment one after another instead of parallel, I suspect this lowers the burden on B2 which causes the "CPU too busy" error message.
}

Since you got the source up and running, I didn't take the time to deploy this to the Docker alpha container. Let me know if you'd rather test it that way and I'd gladly deploy it there too as a temporary alpha build.

@insanity54
Copy link
Author

Testing 9e868ac, I got some successes and some intermittent failures.

/package test 1 (success)

input

{
assetId: "cea2db20-1d89-4f8b-855f-d1c2e6ae2302"
concurrency: 5
name: "hls"
public: true
step: 2
}

output

 {
assetId: "cea2db20-1d89-4f8b-855f-d1c2e6ae2302"
}

logs

Logs

    1
    Synced folder in /tmp/superstreamer-1cae7c1e-3f90-48eb-a638-4996f0c6bc09uTvyXi

2
Got meta: "{"version":1,"streams":{"video_360_400000_h264.m4v":{"type":"video","codec":"h264","height":360,"bitrate":400000,"framerate":60},"audio_eng_128000_aac.m4a":{"type":"audio","codec":"aac","language":"eng","bitrate":128000,"channels":2},"video_144_108000_h264.m4v":{"type":"video","codec":"h264","height":144,"bitrate":108000,"framerate":60}},"segmentSize":2.24}"
3
in=/tmp/superstreamer-1cae7c1e-3f90-48eb-a638-4996f0c6bc09uTvyXi/video_360_400000_h264.m4v,stream=video,init_segment=video_360_400000_h264/init.mp4,segment_template=video_360_400000_h264/$Number$.m4s,playlist_name=video_360_400000_h264/playlist.m3u8,iframe_playlist_name=video_360_400000_h264/iframe.m3u8 in=/tmp/superstreamer-1cae7c1e-3f90-48eb-a638-4996f0c6bc09uTvyXi/audio_eng_128000_aac.m4a,stream=audio,init_segment=audio_eng_128000_aac/init.mp4,segment_template=audio_eng_128000_aac/$Number$.m4a,playlist_name=audio_eng_128000_aac/playlist.m3u8,hls_group_id=audio_aac,hls_name=eng_aac,language=eng in=/tmp/superstreamer-1cae7c1e-3f90-48eb-a638-4996f0c6bc09uTvyXi/video_144_108000_h264.m4v,stream=video,init_segment=video_144_108000_h264/init.mp4,segment_template=video_144_108000_h264/$Number$.m4s,playlist_name=video_144_108000_h264/playlist.m3u8,iframe_playlist_name=video_144_108000_h264/iframe.m3u8 --segment_duration 2.24 --fragment_duration 2.24 --hls_master_playlist_output master.m3u8
4
Uploading to package/cea2db20-1d89-4f8b-855f-d1c2e6ae2302/hls

/package test 2 (failure)

no tomes available

input

 {
assetId: "cea2db20-1d89-4f8b-855f-d1c2e6ae2302"
concurrency: 1
name: "test"
public: false
}

output

(none)

logs

Logs

    1
    Synced folder in /tmp/superstreamer-5bdddacb-e0a2-4776-a850-6f58234433afbias8i

2
Got meta: "{"version":1,"streams":{"video_360_400000_h264.m4v":{"type":"video","codec":"h264","height":360,"bitrate":400000,"framerate":60},"audio_eng_128000_aac.m4a":{"type":"audio","codec":"aac","language":"eng","bitrate":128000,"channels":2},"video_144_108000_h264.m4v":{"type":"video","codec":"h264","height":144,"bitrate":108000,"framerate":60}},"segmentSize":2.24}"
3
in=/tmp/superstreamer-5bdddacb-e0a2-4776-a850-6f58234433afbias8i/video_360_400000_h264.m4v,stream=video,init_segment=video_360_400000_h264/init.mp4,segment_template=video_360_400000_h264/$Number$.m4s,playlist_name=video_360_400000_h264/playlist.m3u8,iframe_playlist_name=video_360_400000_h264/iframe.m3u8 in=/tmp/superstreamer-5bdddacb-e0a2-4776-a850-6f58234433afbias8i/audio_eng_128000_aac.m4a,stream=audio,init_segment=audio_eng_128000_aac/init.mp4,segment_template=audio_eng_128000_aac/$Number$.m4a,playlist_name=audio_eng_128000_aac/playlist.m3u8,hls_group_id=audio_aac,hls_name=eng_aac,language=eng in=/tmp/superstreamer-5bdddacb-e0a2-4776-a850-6f58234433afbias8i/video_144_108000_h264.m4v,stream=video,init_segment=video_144_108000_h264/init.mp4,segment_template=video_144_108000_h264/$Number$.m4s,playlist_name=video_144_108000_h264/playlist.m3u8,iframe_playlist_name=video_144_108000_h264/iframe.m3u8 --segment_duration 2.24 --fragment_duration 2.24 --hls_master_playlist_output master.m3u8
4
Uploading to package/cea2db20-1d89-4f8b-855f-d1c2e6ae2302/test

/package test 3 (success)

Input

 {
assetId: "cea2db20-1d89-4f8b-855f-d1c2e6ae2302"
concurrency: 1
name: "test2"
public: false
step: 2
}

Output

 {
assetId: "cea2db20-1d89-4f8b-855f-d1c2e6ae2302"
}

Logs

    1
    Synced folder in /tmp/superstreamer-53a5ae03-a6f7-47b8-a741-667bfa27ee79joE0df

2
Got meta: "{"version":1,"streams":{"video_360_400000_h264.m4v":{"type":"video","codec":"h264","height":360,"bitrate":400000,"framerate":60},"audio_eng_128000_aac.m4a":{"type":"audio","codec":"aac","language":"eng","bitrate":128000,"channels":2},"video_144_108000_h264.m4v":{"type":"video","codec":"h264","height":144,"bitrate":108000,"framerate":60}},"segmentSize":2.24}"
3
in=/tmp/superstreamer-53a5ae03-a6f7-47b8-a741-667bfa27ee79joE0df/video_360_400000_h264.m4v,stream=video,init_segment=video_360_400000_h264/init.mp4,segment_template=video_360_400000_h264/$Number$.m4s,playlist_name=video_360_400000_h264/playlist.m3u8,iframe_playlist_name=video_360_400000_h264/iframe.m3u8 in=/tmp/superstreamer-53a5ae03-a6f7-47b8-a741-667bfa27ee79joE0df/audio_eng_128000_aac.m4a,stream=audio,init_segment=audio_eng_128000_aac/init.mp4,segment_template=audio_eng_128000_aac/$Number$.m4a,playlist_name=audio_eng_128000_aac/playlist.m3u8,hls_group_id=audio_aac,hls_name=eng_aac,language=eng in=/tmp/superstreamer-53a5ae03-a6f7-47b8-a741-667bfa27ee79joE0df/video_144_108000_h264.m4v,stream=video,init_segment=video_144_108000_h264/init.mp4,segment_template=video_144_108000_h264/$Number$.m4s,playlist_name=video_144_108000_h264/playlist.m3u8,iframe_playlist_name=video_144_108000_h264/iframe.m3u8 --segment_duration 2.24 --fragment_duration 2.24 --hls_master_playlist_output master.m3u8
4
Uploading to package/cea2db20-1d89-4f8b-855f-d1c2e6ae2302/test2

@matvp91
Copy link
Collaborator

matvp91 commented Jan 7, 2025

What surprised me is that you got successful for both "public" being "false" and "true", while earlier we would get the ACL mismatch error (mentioned in the previous issue). This makes me suspect you're not in the latest main branch.

Meanwhile, I've deployed the latest main branch to the Docker' alpha container.

docker compose pull
docker compose up -d

If your B2 bucket is set private, you must set public to false since 9e868ac, which I believe is the case for you. I'm still not sure if that's the final API design I would prefer, but it'll do for now.

Both errors, "CPU too busy" and "no tomes available", seem to come from the Backblaze B2 API.

  • "CPU too busy": very likely you'd get around that with a lower concurrency number. I've also lowered it by default from 10 to 5.
  • "no tomes available": looks like B2 refusing the request, I guess there's not much we can do besides retrying the package job later on. Today I have no retry mechanism in place but it shouldn't be too hard. There's an old, undocumented, issue for retrying here: Retry job logic on failure #73

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants