Voice Changer Intro

This set of documents is a quick walkthrough of how to get RVC models trained and being able to use them in a voice changer running off your GPU card. It was created by yum_food and posted to the VR Dance Academy server. It has been reproduced here with permission and to archive it.

It is intended to serve as a detailed walkthrough/Q&A about using the RVC project's retrieval-based voice conversion tool (rvc-beta).

Using this tool requires 2 steps:

training the AI to use a specific voice
then using that voice to filter your own.

Gotchas:

This tool is pretty heavy when it's running (I was at 30 frames in LS media).
I get best audio quality when using a config that adds ~1.5-2.5 seconds of latency.
Training with 50 minutes of training data (5 10-minute clips), 160 epochs, with batch_size=40 took me ~1 hour using a 3090 and a 5900x.
Training will take up ~50GB of space (training data & snapshots).
You need ~an hour worth of very clean audio to train the model to an ok level of quality. More data + cleaner data + longer training time => better result.

Models can be shared, so you can skip most of the setup process if you get someone to train one for you.

Quick note: even if you're only going to use other people's checkpoints, still complete steps 1-2 of the training instructions above

This information is derived from the RVC-beta github: https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI (edited)

Voice Changer Intro

Gotchas:​

Gotchas: