Skip to main content

Voice Changer Intro

This set of documents is a quick walkthrough of how to get RVC models trained and being able to use them in a voice changer running off your GPU card. It was created by yum_food and posted to the VR Dance Academy server. It has been reproduced here with permission and to archive it.

It is intended to serve as a detailed walkthrough/Q&A about using the RVC project's retrieval-based voice conversion tool (rvc-beta).

Using this tool requires 2 steps:

  1. training the AI to use a specific voice
  2. then using that voice to filter your own.

Gotchas:

  • This tool is pretty heavy when it's running (I was at 30 frames in LS media).
  • I get best audio quality when using a config that adds ~1.5-2.5 seconds of latency.
  • Training with 50 minutes of training data (5 10-minute clips), 160 epochs, with batch_size=40 took me ~1 hour using a 3090 and a 5900x.
  • Training will take up ~50GB of space (training data & snapshots).
  • You need ~an hour worth of very clean audio to train the model to an ok level of quality. More data + cleaner data + longer training time => better result.

Models can be shared, so you can skip most of the setup process if you get someone to train one for you.

Quick note: even if you're only going to use other people's checkpoints, still complete steps 1-2 of the training instructions above

This information is derived from the RVC-beta github: https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI (edited)