Dysarthric Speech Conversion - Audio Samples

Audio samples come from a Phonetics course project and personal testing during my masters program at SNU.

My aim was to improve dysarthric speech intelligibilty.

For full detail about the project click the link below

Contents:


PSOLA-based Voice Conversion

Samples generated by automatically modifying phoneme duration and pitch of dysarthric speech into more healthy-like values.

Original Dysarthric Speech

Dysarthric speaker saying 추석에는 온 가족이 함께 송편을 만든다.

Original Healthy Speech

Healthy speaker saying 추석에는 온 가족이 함께 송편을 만든다.

Modified Dysarthric Speech

Phone-based durational changes to match healthy speaker.

Modified Dysarthric Speech 2

Phone-based duration and pitch changes to match healthy speaker.


Voice Cloning with Transfer Learning

Samples generated by 1. extracting speaker embeddings, 2. predicting a melspectrogram from a sequence of grapheme inputs, and 3. converting spectrograms into time domain waveforms

Samples for Speaker with Mild Dysarthria

Original input to model (note only 6s of audio was used).

Cloned voice.

Samples for Speaker with Moderate Dysarthria

Original input to model (note only 5s of audio was used).

Cloned voice.

Cloned voice using longer input (43s).

Samples for Speaker with Severe Dysarthria

Original input to model (note only 5s of audio was used).

Cloned voice.

Cloned voice using longer input (80s).