Voice Changer Intro
This set of documents is a quick walkthrough of how to get RVC models trained and being able to use them in a voice changer running off your GPU card. It was created by yum_food and posted to the VR Dance Academy server. It has been reproduced here with permission and to archive it.
It is intended to serve as a detailed walkthrough/Q&A about using the RVC project's retrieval-based voice conversion tool (rvc-beta).
Using this tool requires 2 steps:
- training the AI to use a specific voice
- then using that voice to filter your own.
Gotchas:
- This tool is pretty heavy when it's running (I was at 30 frames in LS media).
- I get best audio quality when using a config that adds ~1.5-2.5 seconds of latency.
- Training with 50 minutes of training data (5 10-minute clips), 160 epochs, with batch_size=40 took me ~1 hour using a 3090 and a 5900x.
- Training will take up ~50GB of space (training data & snapshots).
- You need ~an hour worth of very clean audio to train the model to an ok level of quality. More data + cleaner data + longer training time => better result.
Models can be shared, so you can skip most of the setup process if you get someone to train one for you.
Quick note: even if you're only going to use other people's checkpoints, still complete steps 1-2 of the training instructions above
This information is derived from the RVC-beta github: https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI (edited)