LSA 2011 - Option 2 - "Building a TTS System"

Submission procedure explained below.

In this assignment, you are going to build your own limited-domain text-to-speech system. This is a complex task, so it is divided into three simpler subtasks.  The first you should do on your own.  The second and third you can do in teams.

 


Part A: Setup and first small TTS system

IMPORTANT: This part of the assignment cannot be done remotely. You have to do it on one of the Linux computers in the lab used for class. 

Note: You can try installing the Fesival software on your own Linux laptop.  Instructions and download available here.  Not for the faint of heart but doable.  If you want to build your own Festival TTS on a Mac, here are Will Styler's notes.

 

 

A.1. Getting Started on Red Hat in the Lab (ECCE 152)

Log on with your username and password.  You will see a panel of announcements:  click 'OK' at the bottom to see your desktop.

Right-click on the desktop and choose "Open Terminal"

You should be in /home/soils/?/<your username>, your home directory.  We will call this $HOMEDIR below

A.1. Setting up shell and Festival paths

1) If you default shell is not bash,  change it to bash.  NB: The default shell on soils is tcsh.  To find out what shell you currently have, use the command:

echo $SHELL.  You will need to do this for each machine you use in the lab.

To change to bash, run the following command:

chsh -s /bin/bash

You'll need to log completely out (from the System menu at the top of your screen) and log in again for the new shell to take effect, but you may want to do step (2) first.

2) Now check to see if you have a .bashrc file in your home directory, i.e.

ls .bashrc

If you do not have this file, create it. 

3) Now add the following lines to .bashrc (you can use emacs or any other text editor available)

export PATH="/usr/local/festival/bin:/usr/local/speech_tools/bin:$PATH"
export ESTDIR="/usr/local/speech_tools"
export FESTVOXDIR="/usr/local/festvox"

export JHBIN="/home/soils/facstaff/juhi9052/bin"

Again,  you will have to log out and back in for the changes to the ~/.bashrc file to take effect.

Once you do this, you can check if the changes are in effect by running

echo $FESTVOXDIR.

The value specified above should be displayed.  If not, check .bashrc to make sure you have all quote marks, etc., just as above

A.3. Small TTS system: a talking clock

NB: In http://festvox.org/bsv/x1003.html you can find a detailed explanation of each step.


 
Step Commands Comments
1

In your home directory ($HOMEDIR):

mkdir tts

cd tts
mkdir time
cd time

Create a directory 'tts' in your home directory and cd into it. Create another directory 'time' in $HOMEDIR/tts and cd there.
2 $FESTVOXDIR/src/ldom/setup_ldom SLP time xyz Setup dir
  At this point, take a look at (e.g. using 'more' or emacs) these two file now in your 'time' directory:
etc/time.data, which contains a set of utterances that should cover all the possible variations in the domain;
festvox/SLP_time_xyz.scm, which defines several functions (in Scheme) to convert a time like "07:57" into an utterance like "The time is now, a little after five to eight, in the morning".
In order to build a new limited domain it is necessary to rewrite these files. For this part of the homework you do not need to edit these files, but you will in Parts B and C.
3 The following assume you are in $HOMEDIR/tts/time

festival -b festvox/build_ldom.scm '(build_prompts "etc/time.data")'

Generate prompts
4 $JHBIN/prompt_them etc/time.data Record prompts.
You need a microphone for this step. You will be asked to read out 24 prompts. Read the recording tips before starting.  If you have trouble getting clean recordings even after following the tips, you can record your prompts as .wav files on other machines and import them; instructions are here.
5 $JHBIN/make_labs prompt-wav/*.wav Autolabel prompts
6 festival -b festvox/build_ldom.scm '(build_utts "etc/time.data")' Build utterances
7 cp etc/time.data etc/txt.done.data  
8 bin/make_pm_wave wav/*.wav
$JHBIN/make_pm_fix pm/*.pm
Extract pitchmarks & fix them
9 bin/simple_powernormalize wav/*.wav Power normalization
10 bin/make_mcep wav/*.wav MCEP vectors
11 festival -b festvox/build_ldom.scm '(build_clunits "etc/time.data")' Build LDOM Synthesizer
12 festival festvox/SLP_time_xyz_ldom.scm '(voice_SLP_time_xyz_ldom)' Run your synthesizer
13 (saytime)
(saythistime "07:57")
(saythistime "14:22")
Once in Festival, use these commands to make your synthesizer say the time.
Use CTRL+D to exit Festival.

Documentation

Requirements:

  1. To grade your work, I need to run your TTS system. Therefore, please double check that all your files under $HOMEDIR have read permissions for everybody, and also exec permissions in the case of directories (that is the default setting, so if you have not chmoded anything, you do not need to change anything).
  2. You should generate three wav files using your TTS with the following procedure:

    cd $HOMEDIR/time
    festival

    (load "festvox/SLP_time_xyz_ldom.scm")
    (voice_SLP_time_xyz_ldom)

    (Parameter.set 'Audio_Method 'Audio_Command)
    (Parameter.set 'Audio_Required_Rate 16000)
    (Parameter.set 'Audio_Required_Format 'wav)

    (Parameter.set 'Audio_Command "cp $FILE time1.wav")
    (saytime)

    (Parameter.set 'Audio_Command "cp $FILE time2.wav")
    (saythistime "07:57")

    (Parameter.set 'Audio_Command "cp $FILE time3.wav")
    (saythistime "14:22")
     


 

Part B: Preparing your limited-domain TTS

 

IMPORTANT: All steps in Part B can be done remotely.

B.1. Choosing a limited domain:

In Part A, you built a simple talking-clock TTS system. Now, you will choose a limited domain and record a series of prompts in that domain.  You can pretend you are building a Spoken Dialogue System and record the system side of the conversations you want to support.. Also, instead of recording it using a neutral voice, you may want to choose a particular style or personality that you think is most appropriate for your domain and application.

B.2. Designing the input and output of your TTS system

Define as formally as possible what the input and output of your TTS system is going to look like.

Your limited domain must have at least five degrees of freedom, and you have to provide an estimate of the number of possible sentences that could be generated.

B.3. Setting things up

Log on to your Colorado account and run the following commands, where USERNAME is the user name of your account (e.g. fb2175), and TOPIC is a string such as 'number', 'weather', 'street', etc.:

Step Commands Comments
1 mkdir $HOMEDIR/partc
 
Create a directory for part c.
2 cd $HOMEDIR/partc
$FESTVOXDIR/src/ldom/setup_ldom SLP TOPIC xyz
Setup the directory for part c.

B.4. Designing the prompts

Next, you need to design the prompts for your TTS system. As you saw in Part A, the talking clock uses the prompts in time/etc/time.data:

( time0001 "The time is now, exactly five past one, in the morning." )
( time0002 "The time is now, just after ten past two, in the morning." )
( time0003 "The time is now, a little after quarter past three, in the morning." )
...

Now, you have to create a similar file for your domain, and save it as partc/etc/TOPIC.data.  NOTE:  The spaces after '(' and before ')'  in each line are critical.

For an explanation on how to design the prompts, go to http://www.festvox.org/bsv/c941.html#AEN952.

Documentation


 

Part C: Completing your limited-domain TTS

IMPORTANT: Section C.2 of this part of the work cannot be done remotely. You have to do it on one of the Linux computers in the Lab.

C.1. Introduction

In Part B, you started building your limited-domain TTS system. You defined its input and output, designed the set of prompts you will record.

Now, in order to complete your TTS system you need to:
a) record the set of prompts (section C.2);
b) write a script that transforms an input string into an English sentence, and sends it to Festival to synthesize it (section C.3).

C.2. Recording and processing the prompts

Log on locally to a computer in the ECCE 152 Lab. Open a Terminal window (Applications Accessories Terminal) and run the following commands, replacing USERNAME with the user name of your account (e.g. fb2175), and TOPIC with a string such as 'number', 'weather', 'street', etc.

Before starting, read the tips for part C, which you may find useful.

Step Commands Comments
1 cd $HOMEDIR/partc  
2 The file etc/TOPIC.data should contain the prompts you designed. Make sure that its syntax is correct.
3 festival -b festvox/build_ldom.scm '(build_prompts "etc/TOPIC.data")' Generate prompts
4 $JHBIN/prompt_them etc/TOPIC.data Record prompts.
You need a microphone for this step. You will be asked to read out, one by one, the prompts you designed. Before starting, review the recording tips.
5 $JHBIN/make_labs prompt-wav/*.wav Autolabel prompts
6 festival -b festvox/build_ldom.scm '(build_utts "etc/TOPIC.data")' Build utterances
7 cp etc/TOPIC.data etc/txt.done.data  
8 bin/make_pm_wave wav/*.wav
$JHBIN/make_pm_fix pm/*.pm
Extract pitchmarks & fix them
9 bin/simple_powernormalize wav/*.wav Power normalization
10 bin/make_mcep wav/*.wav MCEP vectors
11 festival -b festvox/build_ldom.scm '(build_clunits "etc/TOPIC.data")' Build LDOM Synthesizer
12 festival festvox/SLP_TOPIC_xyz_ldom.scm '(voice_SLP_TOPIC_xyz_ldom)' Run Festival (use CTRL+D to exit).
13 Now, you can make your synthesizer say sentences in your domain by hand. For example, in the time domain you could do that by running: (SayText "The time is now, a little after twenty past two, in the afternoon.")
 

Warning: If you use words that do not belong to your domain, the synthesizer will default to a Festival voice.
 

C.3. Writing a script that transforms the input and synthesizes it

Now, you need to write a script that receives an input string as you defined in Part B, and transforms it into an English sentence that your TTS system can synthesize. For example, in the time domain the script would transform a string like "14:22" into a sentence like "The time is now, a little after twenty past two, in the afternoon." A simple (and perhaps long) case or switch statement should be enough to achieve this.
You must test your script properly before submission (in one of the speech-lab machines). If you sibmit a not working script, you will lose a lot of points, and we will not debug your code.

Use this Perl script as a basis. Update the variables $USERNAME and $TOPIC, complete the code where marked, and define the function generate_sentence, which does the input-output transformation. Please comment your code thoroughly.

The rest of the code in this script creates a temporary Festival script and runs it. That temporary Festival script loads your limited domain and creates a wav file with the resulting synthesis (you did exactly the same thing by hand in Part A for the time domain). You should not need to modify any of this.

Note: It is also possible to do the transformation part of the script (input string to English sentence) using the Scheme programming language, and later import your script into Festival. If you want to do it that way, please check with the TA first.

Documentation

Submission


Create a folder YourIdentikey-PROJ (e.g., fb2175-PROJ) create three subdirectories, parta, partb, and partc -- YOU MUST FOLLOW THESE CONVENTIONS EXACTLY.

1. For Part A, save each of your three wave files: time1.wav, time2.wav, and time3.wav (see part A) in a subfolder named <YourIdentikey-parta>. 

2. For Part B, save the following files in partb subfolder:


3. For Part C, save the following files in partc subfolder:
Now, compress (in zip format only) the main folder to YourIdentikey-PROJ.zip (e.g., fb2175-PROJ.zip). Submit this zip file in CULEARN.

Tips for recording (for parts A and C only)

Further tips for part C