Skip to content

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

License

NotificationsYou must be signed in to change notification settings

AIGC-Audio/AudioGPT

Repository files navigation

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

arXivGitHub StarsvisitorsHugging Face

We provide our implementation and pretrained models as open source in this repository.

Get Started

Please refer to run.md

Capabilities

Here we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to asset.

Currently not every model has repository.

Speech

TaskSupported Foundation ModelsStatus
Text-to-SpeechFastSpeech, SyntaSpeech, VITSYes (WIP)
Style TransferGenerSpeechYes
Speech Recognitionwhisper, ConformerYes
Speech EnhancementConvTasNetYes (WIP)
Speech SeparationTF-GridNetYes (WIP)
Speech TranslationMulti-decoderWIP
Mono-to-BinauralNeuralWarpYes

Sing

TaskSupported Foundation ModelsStatus
Text-to-SingDiffSinger, VISingerYes (WIP)

Audio

TaskSupported Foundation ModelsStatus
Text-to-AudioMake-An-AudioYes
Audio InpaintingMake-An-AudioYes
Image-to-AudioMake-An-AudioYes
Sound DetectionAudio-transformerYes
Target Sound DetectionTSDNetYes
Sound ExtractionLASSNetYes

Talking Head

TaskSupported Foundation ModelsStatus
Talking Head SynthesisGeneFaceYes (WIP)

Acknowledgement

We appreciate the open source of the following projects:

ESPNetNATSpeechVisual ChatGPTHugging FaceLangChainStable Diffusion

About

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published