-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Noise contributions contain invalid PCM data #253
Comments
Thanks. Indeed something went horribly wrong with these files. |
I have fixed the odd-sized files you pointed out, removed the ones with voice (left all the ones with loud noises) and updated the .tar.gz file (same URL). Let me know if you find any other issue with those files. I've started training some new models and they indeed behave a lot better. Thanks for catching these issues and please let me know if you find any other issue with these contributions. |
Thanks for the update! It's nice to hear, that the changes make a noticeable difference and I'm looking forward to compare the results myself, once the |
Just pushed a new model trained with the rnnoise_contribution dataset (previously it was just data I collected myself) and with some augmentation/loss tuning. Let me know how it goes. |
I've created to test files using this audio:
Test File 1Uncleaned: steamboat_alice.mp4Cleaned with the old model (5e78): steamboat_alice.cleaned.old.5e78.mp4Cleaned with the new model (0a87): steamboat_alice.cleaned.0a87.mp4Test File 2Uncleaned: ambient_noise_alice.mp4Cleaned with the old model (5e78): ambient_noise_alice.cleaned.5e78.mp4Cleaned with the new model (0a87): ambient_noise_alice.cleaned.0a87.mp4My JudgementHonestly, I cannot tell the difference between the old and new model. |
I had one more idea for an experiment: Could the voice activity detection of the RNN be used to identify noise contributions, which contain voice? Maybe this way it would be easy to identify more files, which contain clear voices. |
I have taken a closer look at the noise contributions at media.xiph.org/rnnoise/rnnoise_contributions.tar.gz and found some files that contain an uneven amount of bytes. Since the used PCM format uses 16 bit values, this should be impossible. Tools like
ffmpeg
complain about this. Those are the problematic files:The files still seem to contain valid audio data, so I think they could be salvaged by just trimming one byte off the end. This can be done with
dd
:The text was updated successfully, but these errors were encountered: