Noise contributions contain invalid PCM data #253

codesoap · 2025-01-12T14:07:51Z

I have taken a closer look at the noise contributions at media.xiph.org/rnnoise/rnnoise_contributions.tar.gz and found some files that contain an uneven amount of bytes. Since the used PCM format uses 16 bit values, this should be impossible. Tools like ffmpeg complain about this. Those are the problematic files:

$ wc -c *raw | awk '$1%2!=0'
 4726335 1506624776731-coffee.raw
 5356767 1506902114432-other.raw
 4964103 1506935061275-other.raw
 4013831 1506961537920-train.raw
 5274903 1507082941184-other.raw
 5046023 1507234977723-other.raw
 4587271 1507257911313-other.raw
 4292359 1507259349364-other.raw
 5455623 1507609273368-other.raw
 4628211 1508448093569-none.raw
 4783377 1514620984404-street.raw
 4668705 1519910987703-coffee.raw

The files still seem to contain valid audio data, so I think they could be salvaged by just trimming one byte off the end. This can be done with dd:

dd if=1506624776731-coffee.raw of=tmp bs=1 count=4726334 && mv tmp 1506624776731-coffee.raw
dd if=1506902114432-other.raw  of=tmp bs=1 count=5356766 && mv tmp 1506902114432-other.raw
dd if=1506935061275-other.raw  of=tmp bs=1 count=4964102 && mv tmp 1506935061275-other.raw
dd if=1506961537920-train.raw  of=tmp bs=1 count=4013830 && mv tmp 1506961537920-train.raw
dd if=1507082941184-other.raw  of=tmp bs=1 count=5274902 && mv tmp 1507082941184-other.raw
dd if=1507234977723-other.raw  of=tmp bs=1 count=5046022 && mv tmp 1507234977723-other.raw
dd if=1507257911313-other.raw  of=tmp bs=1 count=4587270 && mv tmp 1507257911313-other.raw
dd if=1507259349364-other.raw  of=tmp bs=1 count=4292358 && mv tmp 1507259349364-other.raw
dd if=1507609273368-other.raw  of=tmp bs=1 count=5455622 && mv tmp 1507609273368-other.raw
dd if=1508448093569-none.raw   of=tmp bs=1 count=4628210 && mv tmp 1508448093569-none.raw
dd if=1514620984404-street.raw of=tmp bs=1 count=4783376 && mv tmp 1514620984404-street.raw
dd if=1519910987703-coffee.raw of=tmp bs=1 count=4668704 && mv tmp 1519910987703-coffee.raw

The text was updated successfully, but these errors were encountered:

jmvalin · 2025-01-22T02:53:24Z

Thanks. Indeed something went horribly wrong with these files.

jmvalin · 2025-01-29T03:58:00Z

I have fixed the odd-sized files you pointed out, removed the ones with voice (left all the ones with loud noises) and updated the .tar.gz file (same URL). Let me know if you find any other issue with those files. I've started training some new models and they indeed behave a lot better. Thanks for catching these issues and please let me know if you find any other issue with these contributions.

codesoap · 2025-01-29T07:08:57Z

Thanks for the update! It's nice to hear, that the changes make a noticeable difference and I'm looking forward to compare the results myself, once the rnnoise_demo tool is updated.

jmvalin · 2025-01-30T05:19:50Z

Just pushed a new model trained with the rnnoise_contribution dataset (previously it was just data I collected myself) and with some augmentation/loss tuning. Let me know how it goes.

codesoap · 2025-01-30T17:23:30Z

I've created to test files using this audio:

Act 1 of a reading of "Alice in Wonderland" as voice: https://librivox.org/alice-in-wonderland-by-alice-gerstenberg
The noise of test file 1 is a steam boat: https://opengameart.org/content/steamboat-engine-sound
The noise of test file 2 is similar to white noise: https://opengameart.org/content/background-rumble-noise

Test File 1

Uncleaned:

steamboat_alice.mp4

Cleaned with the old model (5e78):

steamboat_alice.cleaned.old.5e78.mp4

Cleaned with the new model (0a87):

steamboat_alice.cleaned.0a87.mp4

Test File 2

Uncleaned:

ambient_noise_alice.mp4

Cleaned with the old model (5e78):

ambient_noise_alice.cleaned.5e78.mp4

Cleaned with the new model (0a87):

ambient_noise_alice.cleaned.0a87.mp4

My Judgement

Honestly, I cannot tell the difference between the old and new model.

codesoap · 2025-01-30T20:20:29Z

I had one more idea for an experiment: Could the voice activity detection of the RNN be used to identify noise contributions, which contain voice? Maybe this way it would be easy to identify more files, which contain clear voices.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Noise contributions contain invalid PCM data #253

Noise contributions contain invalid PCM data #253

codesoap commented Jan 12, 2025

jmvalin commented Jan 22, 2025

jmvalin commented Jan 29, 2025

codesoap commented Jan 29, 2025

jmvalin commented Jan 30, 2025

codesoap commented Jan 30, 2025

codesoap commented Jan 30, 2025

Noise contributions contain invalid PCM data #253

Noise contributions contain invalid PCM data #253

Comments

codesoap commented Jan 12, 2025

jmvalin commented Jan 22, 2025

jmvalin commented Jan 29, 2025

codesoap commented Jan 29, 2025

jmvalin commented Jan 30, 2025

codesoap commented Jan 30, 2025

Test File 1

Test File 2

My Judgement

codesoap commented Jan 30, 2025