Why speech synthesis?
Speech synthesis is a fun way to experiment with sound. If you’re looking for new ways to inject voice-like features to your synths, guitars and pads this is a great way to get started. We can use technology to generate synthetic speech and to apply the resonant characteristics of vowels to other recorded sound sources. There are plenty of combinations to explore and it’s an endless journey of discovery. Technology for speech synthesis is an amazing platform for sound transformations. Let’s start with a simple instrument that allows us to shape the frequency content of our recordings with the timbre of speech sounds. MATLAB is great for experimenting with sound and testing new ideas for audio processing. We can easily build and prototype our sound design instrument.
How?
One method to synthesize vowels is to connect in cascade three bandpass resonators. Our synthesizer’s model should look like this:
Figure 1
In order to build our synthesizer we need to complete the following tasks:
- Implement a resonator: this will be our building block.
- Connect in cascade three resonators according to the schematics in Figure 1.
- Connect a test signal / audio file to the input of our series of resonators.
- Go to the chart with the values of the formant frequencies (Figure 2).
- Set the center frequencies of the resonators to the formant values in Figure 2.
- Play the result.
- Render the output to a.wav file.
The model in Figure 1 is a simplified version of a cascade formant synthesizer. However this will be a good starting point to begin filtering our recordings with vowels. The following MATLAB code allows us to give our sounds the sonic character of vowels. We can experiment with the effect of different vowel sounds by changing the center frequencies of the filters. We can implement this by using the formant frequencies in Figure 2. This is an example of values we can set for the variables fc1, fc2 and fc3.
Vowel
F1
F2
F3
[a]
700
1220
2600
[ae]
620
1660
2430
[o]
540
1100
2300
[u]
320
900
2200
[i]
400
1800
2570
Figure 2: Formant frequencies of vowels
For more details on speech synthesis and formant frequency values:
Klatt, Dennis H. "Software for a cascade/parallel formant synthesizer." the Journal of the Acoustical Society of America 67, no. 3 (1980): 971-995.
I made an example that plays an original mono synth track and then filters the same track with all the vowel patterns in Figure 2. This is what it sounds like:
MATLAB implementation: Formant Synthesizer
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Simple cascade vowel synthesis.
%%% author: Michele Pizzi
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[sig,Fs]= audioread(’mySound.wav’); %input audio file
fc1 = 700; % Hz first formant
fc2 = 1220;% Hz second formant
fc3 = 2600;% Hz third formant
band = 120;% bandwidth
f1 = peakf(sig,fc1,band,Fs); %filter first formant
f2 = peakf(f1,fc2,band,Fs); %filter second formant
f3 = peakf(f2,fc3,band,Fs); %filter third formant
out = (1./max(abs(f3))).*f3; %normalization
audiowrite(’myVowel.wav’,out,Fs) % writes output to an audio file
The script above will need the function peakf.m in order to run. The function can be implemented as follows:
function [out] = peakf(in,Fc,bandW,Fs)
%peak filter
%in: input signal
%Fc: center frequency
%bandW: bandwidth filter
%Fs: sampling frequency
out = zeros(length(in),1);
wo = Fc/(Fs/2);
bw = bandW/(Fs/2);
[b,a] = iirpeak(wo,bw);
out=filter(b,a,in);
end