Saying goodbye to MIDI-A/B modes in this forked repository #67

yqzhishen · 2023-03-10T11:17:55Z

yqzhishen
Mar 10, 2023
Maintainer

We will perform lots of clean-ups and refactoring in the next version, with breaking changes described in the following release:
https://.com/openvpi/DiffSinger/releases/tag/v1.7.1

Why removing MIDI modes?

MIDI modes do have ability to predict phoneme durations (MIDI-A and MIDI-B) and pitch (MIDI-A) from music score inputs (so-called auto-tuning by some people). However, MIDI modes have many disadvantages:

MIDI-B lacks controllability of pitch, and MIDI-A predicts really poor pitch with many error cases.
Current MIDI modes requires heavy data labeling, including MIDI sequence, note length and slurs.
The current labeling format (especially the slur tags) can only deal with two-phase phoneme systems like Chinese and Japanese. This is the root cause that we cannot introduce English, Russian, Polish, etc. into the current pipelines.
MIDI isn't really doing anything: How we sing a song depends on the f0 curve instead of MIDI. Current MIDI mode cannot predict your f0 well, so it is better not to deal with MIDI at all.
Code of MIDI-A/B is all messed up in the codebase (inherited from the original repository). Without cleaning up, we are hard to move forward to our next development plans.

What is the recommended mode now?

We recommend all users move to MIDI-less mode at this point. This will be the standard mode of DiffSinger acoustic model in the future, with all pipelines and facilities (engines, editors) focusing on it. Its datasets are easy to label and build with provided tools and pipelines, and it has better performance and controllability than MIDI-A/B. Users can also enjoy the latest features, like multi-speaker models, dynamic speaker mix, data augmentation, gender and velocity control, with MIDI-less mode.

Will MIDI modes be completely deleted?

No, at least not for now. There is one disadvantage of MIDI-less mode: it has no ability to predict phoneme durations. The work-around is rhythmizers (FastSpeech2Encoder+DurationPredictor), which is grabbed from MIDI-A mode. Code of MIDI-A/B will be kept in the release above. Also, a branch for MIDI-A/B will be kept although the main branch will advance.

We will keep MIDI-A/B (but without maintenance) in this repository until we finish developing better alternatives and the corresponding customized making pipelines to let everyone able to easily prepare and train their own duration/f0/... models.

Will MIDI-related stuff come back in the future?

Absolutely yes! Our goal is to let user simply input music scores and lyrics to generate nice singing voices, so of course we must deal with MIDIs.

However, we will limit the usage of MIDI inputs: it is only used to predict phoneme durations, f0 and other parameters. This will be called variance adaptors or variance models, whose outputs can be directly used by current MIDI-less acoustic models. With this decoupled cascade architecture, we are able to achieve higher flexibility, quality and controllability.

blueyred · 2023-05-30T08:48:52Z

blueyred
May 30, 2023

Decoupling from the phoneme durations and f0 predictor sounds really sensible.
I'd like to train a rhythmizer, are there any examples of training one?

1 reply

yqzhishen May 30, 2023
Maintainer Author

Currently there is a temporary solution for variance model training here. Our training code is on variance branch, but it is not ready yet for a public announcement.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saying goodbye to MIDI-A/B modes in this forked repository #67

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Saying goodbye to MIDI-A/B modes in this forked repository #67

yqzhishenMar 10, 2023Maintainer

Why removing MIDI modes?

What is the recommended mode now?

Will MIDI modes be completely deleted?

Will MIDI-related stuff come back in the future?

Replies: 1 comment · 1 reply

blueyredMay 30, 2023

yqzhishen May 30, 2023Maintainer Author

yqzhishen
Mar 10, 2023
Maintainer

Replies: 1 comment 1 reply

blueyred
May 30, 2023

yqzhishen May 30, 2023
Maintainer Author