Audio samples come from a Phonetics course project and personal testing during my masters program at SNU.
My aim was to improve dysarthric speech intelligibilty.
For full detail about the project click the link below
Samples generated by automatically modifying phoneme duration and pitch of dysarthric speech into more healthy-like values.
Dysarthric speaker saying 추석에는 온 가족이 함께 송편을 만든다.
Healthy speaker saying 추석에는 온 가족이 함께 송편을 만든다.
Phone-based durational changes to match healthy speaker.
Phone-based duration and pitch changes to match healthy speaker.
Samples generated by 1. extracting speaker embeddings, 2. predicting a melspectrogram from a sequence of grapheme inputs, and 3. converting spectrograms into time domain waveforms
Original input to model (note only 6s of audio was used).
Cloned voice.
Original input to model (note only 5s of audio was used).
Cloned voice.
Cloned voice using longer input (43s).
Original input to model (note only 5s of audio was used).
Cloned voice.
Cloned voice using longer input (80s).