CS4706 - Spring 2006
Homework 3 - "Building a TTS System"

In this homework, you are going to build your own limited-domain text-to-speech system. This is a complex task, so the homework is divided in three simpler subtasks, which we call Part A, Part B and Part C.

Part A: Setup and first small TTS system

Due: Wednesday, March 1, 2006, by 2:40pm.
Submission procedure explained below.

IMPORTANT: This part of the homework cannot be done remotelly. You have to do it on one of the Linux computers in the Speech Lab, for which you need to sign-up first.

Note #1: If you also want to install the software on your own Linux computer, ask Agus for instructions.

Note #2: It might also be possible to do this homework locally on one of the Linux computers in the CLIC Lab. However, we have not tested the software there, and we cannot guarantee that it will work.

A.1. Setup

Add these lines to your ~/.profile file:

export PATH=$PATH:/proj/speech/users/cs4706/speech/festival/bin/
export PATH=$PATH:/proj/speech/users/cs4706/speech/speech_tools/bin/
export FESTVOXDIR=/proj/speech/users/cs4706/speech/festvox
export ESTDIR=/proj/speech/users/cs4706/speech/speech_tools

Remember that you have to log out and back in for the changes to the ~/.profile file to take effect. You can check if the changes are in effect by running echo $FESTVOXDIR. The value specified above should be displayed.

A.2. Small TTS system: a talking clock

Log on locally to a Linux computer in the Speech Lab. Start X-windows with the command startx after logging in. Once on X-windows, open a Terminal window (Start menu » System Tools » Terminal) and run the following commands, replacing USERNAME with the user name of your CS account (e.g. jfk2101).

In http://festvox.org/bsv/x1003.html you can find a detailed explanation of each step.


Step Commands Comments
1 mkdir /proj/speech/users/cs4706/USERNAME
cd /proj/speech/users/cs4706/USERNAME
mkdir time
cd time
Create a directory and cd into it.
2 $FESTVOXDIR/src/ldom/setup_ldom SLP time xyz Setup dir
  At this point, take a look at these two files:
etc/time.data, which contains a set of utterances that should cover all the possible variations in the domain;
festvox/SLP_time_xyz.scm, which defines several functions (in Scheme) to convert a time like "07:57" into an utterance like "The time is now, a little after five to eight, in the morning".
In order to build a new limited domain it is necessary to rewrite these files. For this part of the homework you do not need to edit these files, but you will in Parts B and C.
3 festival -b festvox/build_ldom.scm '(build_prompts "etc/time.data")' Generate prompts
4 bin/prompt_them etc/time.data Record prompts.
You need a microphone for this step. You will be asked to read out 24 prompts. Read the recording tips before starting.
5 bin/make_labs prompt-wav/*.wav Autolabel prompts
6 festival -b festvox/build_ldom.scm '(build_utts "etc/time.data")' Build utterances
7 bin/make_pm_wave wav/*.wav
bin/make_pm_fix pm/*.pm
Extract pitchmarks & fix them
8 bin/simple_powernormalize wav/*.wav Power normalization
9 bin/make_mcep wav/*.wav MCEP vectors
10 festival -b festvox/build_ldom.scm '(build_clunits "etc/time.data")' Build LDOM Synthesizer
11 festival festvox/SLP_time_xyz_ldom.scm '(voice_SLP_time_xyz_ldom)' Run your synthesizer
12 (saytime)
(saythistime "07:57")
(saythistime "14:22")
Once in Festival, use these commands to make your synthesizer say the time.
Use CTRL+D to exit Festival.

Documentation

Submission

  1. To grade your homework, we need to run your TTS system. Therefore, please double check that all your files under /proj/speech/users/cs4706/USERNAME have read permissions for everybody, and also exec permissions in the case of directories (that is the default setting, so if you have not chmoded anything, you do not need to change anything).
  2. Submit by e-mail to agus [at] cs.columbia.edu the three wav files generated by your TTS with the following procedure:

    cd /proj/speech/users/cs4706/USERNAME/time
    festival

    (load "festvox/SLP_time_xyz_ldom.scm")
    (voice_SLP_time_xyz_ldom)

    (Parameter.set 'Audio_Method 'Audio_Command)
    (Parameter.set 'Audio_Required_Rate 16000)
    (Parameter.set 'Audio_Required_Format 'wav)

    (Parameter.set 'Audio_Command "cp $FILE time1.wav")
    (saytime)

    (Parameter.set 'Audio_Command "cp $FILE time2.wav")
    (saythistime "07:57")

    (Parameter.set 'Audio_Command "cp $FILE time3.wav")
    (saythistime "14:22")

Submit all your files in one compressed file named "hw3a-YourUni", with the corresponding extension (.gz, .zip, .rar, etc.). Example: "hw3a-ag2251.zip".


Part B: Preparing your limited-domain TTS

Due: Wednesday, March 22, 2006, by 2:40pm.
Submission procedure explained below.

IMPORTANT: All steps in Part B can be done remotely.

B.1. Choosing a limited domain and two speaking styles

In Part A, you built a simple talking-clock TTS system. Now, you will build a TTS system with a limited domain of your choice. And, instead of recording it using a neutral voice, you will record two versions of the same system, each with a distinct style. You will also choose the manner in which the two speaking styles will differ.

So, first of all, choose a domain from the following list:

Next, choose the manner in which the two speaking styles will differ:

Choose carefully how the styles will differ, taking into account that you will have to record them using your own voice. Think how you will ellicit them in order to make them clearly distinguishable from each other.

B.2. Designing the input and output of your TTS system

Define as formally as possible what the input and output of your TTS system is going to look like.

Your limited domain must have at least three degrees of freedom, and you have to provide an estimate of the number of possible sentences that could be generated.

For each of the two speaking styles you will create a full instance of a Festival TTS system. They will be named style1 and style2, and they will share the limited domain, as well as the definition of input and output.

In Part C you will record each instance separately, adapting your own voice in order to convey the desired personality, or to address the desired audience.

B.3. Setting things up

Log on to clic.cs.columbia.edu with your CS account (either locally or remotely), and run the following commands, where USERNAME is the user name of your CS account (e.g. jfk2101), and TOPIC is a string such as 'number', 'weather', 'street', etc.:

Step Commands Comments
1 mkdir /proj/speech/users/cs4706/USERNAME/style1
mkdir /proj/speech/users/cs4706/USERNAME/style2
Create a directory for each speaking style.
2 cd /proj/speech/users/cs4706/USERNAME/style1
$FESTVOXDIR/src/ldom/setup_ldom SLP TOPIC xyz
Setup the directory for the first speaking style.
3 cd /proj/speech/users/cs4706/USERNAME/style2
$FESTVOXDIR/src/ldom/setup_ldom SLP TOPIC xyz
Setup the directory for the second speaking style.

B.4. Designing the prompts

Next, you need to design the prompts for your TTS system. As you saw in Part A, the talking clock uses the prompts in time/etc/time.data:

( time0001 "The time is now, exactly five past one, in the morning." )
( time0002 "The time is now, just after ten past two, in the morning." )
( time0003 "The time is now, a little after quarter past three, in the morning." )
...

Now, you have to create a similar file for your domain, and save it twice, as style1/etc/TOPIC.data and style2/etc/TOPIC.data.
Both files should always be identical, so if you modify one of them do not forget to modify the other one too.

For an explanation on how to design the prompts, go to http://www.festvox.org/bsv/bsv-ldom-ch.html#AEN952.

Documentation

Submission

Submit by e-mail to agus [at] cs.columbia.edu the following files:


Submit all your files in one compressed file named "hw3b-YourUni", with the corresponding extension (.gz, .zip, .rar, etc.). Example: "hw3b-ag2251.zip".


Part C: Completing your limited-domain TTS

Due: Monday, April 10, 2006, by 2:40pm.
Submission procedure explained below.

IMPORTANT: Section C.2 of this part of the homework cannot be done remotelly. You have to do it on one of the Linux computers in the Speech Lab, for which you need to sign-up first.

Note: It might also be possible to do this homework locally on one of the Linux computers in the CLIC Lab. However, we have not tested the software there, and we cannot guarantee that it will work.

C.1. Introduction

In Part B, you started building your limited-domain TTS system. You defined its input and output, designed the set of prompts you will record, and decided how the two speaking styles will differ.

Now, in order to complete your TTS system you need to:
a) record the set of prompts, once with each speaking style (section C.2);
b) write a script that transforms an input string into an English sentence, and sends it to Festival to synthesize it (section C.3).

C.2. Recording and processing the prompts

Log on locally to a Linux computer in the Speech Lab. Start X-windows with the command startx after logging in. Once on X-windows, open a Terminal window (Start menu » System Tools » Terminal) and run the following commands, replacing USERNAME with the user name of your CS account (e.g. jfk2101), and TOPIC with a string such as 'number', 'weather', 'street', etc.

Before starting, read the tips for part C, which you may find useful.

Step Commands Comments
1 cd /proj/speech/users/cs4706/USERNAME/style1  
2 The file etc/TOPIC.data should contain the prompts you designed. Make sure that its syntax is correct.
3 festival -b festvox/build_ldom.scm '(build_prompts "etc/TOPIC.data")' Generate prompts
4 bin/prompt_them etc/TOPIC.data Record prompts.
You need a microphone for this step. You will be asked to read out, one by one, the prompts you designed. Before starting, review the recording tips.
5 bin/make_labs prompt-wav/*.wav Autolabel prompts
6 festival -b festvox/build_ldom.scm '(build_utts "etc/TOPIC.data")' Build utterances
7 bin/make_pm_wave wav/*.wav
bin/make_pm_fix pm/*.pm
Extract pitchmarks & fix them
8 bin/simple_powernormalize wav/*.wav Power normalization
9 bin/make_mcep wav/*.wav MCEP vectors
10 festival -b festvox/build_ldom.scm '(build_clunits "etc/TOPIC.data")' Build LDOM Synthesizer
11 festival festvox/SLP_TOPIC_xyz_ldom.scm '(voice_SLP_TOPIC_xyz_ldom)' Run Festival (use CTRL+D to exit).
12

Now, you can make your synthesizer say sentences in your domain by hand. For example, in the time domain you could do that by running: (SayText "The time is now, a little after twenty past two, in the afternoon.")

Warning: If you use words that do not belong to your domain, the synthesizer will fail.

Now repeat steps 1 to 12, this time using style2 instead of style1 in the first step. Be very careful with this, or you may overwrite your previous recordings!

C.3. Writing a script that transforms the input and synthesizes it

Now, you need to write a script that receives an input string as you defined in Part B, and transforms it into an English sentence that your TTS system can synthesize. For example, in the time domain the script would transform a string like "14:22" into a sentence like "The time is now, a little after twenty past two, in the afternoon." A simple (and perhaps long) case or switch statement should be enough to achieve this.

Use this Perl script as a basis. Update the variables $USERNAME and $TOPIC, complete the code where marked, and define the function generate_sentence, which does the input-output transformation. Please comment your code thoroughly.

The rest of the code in this script creates a temporary Festival script and runs it. That temporary Festival script loads your limited domain with the selected speaking style, and creates a wav file with the resulting synthesis (you did exactly the same thing by hand in Part A for the time domain). You should not need to modify any of this.

Note: It is also possible to do the transformation part of the script (input string to English sentence) using the Scheme programming language, and later import your script into Festival. If you want to do it that way, please check with Agus first.

Documentation

Submission

  1. To grade your homework, we need to run your TTS system. Therefore, please double check that all your files under /proj/speech/users/cs4706/USERNAME have read permissions for everybody, and also exec permissions in the case of directories and scripts.
  2. Submit by e-mail to agus [at] cs.columbia.edu the following files:

A complete example: cards

As an illustration, in /proj/speech/users/cs4706/speech/tts-cards there is a full example of a TTS system (with only one speaking style), embedded in a Java application.

  1. cards/ contains all Festival and Festvox files.
  2. cards/etc/cards.data is the file with the prompts designed for this particular domain.
  3. CardToSpeech.pl is a Perl script similar to the one you have to prepare in Section C.3. Run it with no arguments to find out its usage and its input format. Make sure to enter the absolute path of the wav file (somewhere you have write permissions).
  4. Run locally from a terminal on X-windows: java CardToSpeech
    This is a Java application that uses the previous script to synthesize description of cards. Have fun!
Note: The Java application uses the Perl script CardToSpeech.pl. We only want to show you how such a script can be later used within a front-end application. For our course, doing such an application is completely optional (but recommended if you have the time!).

Tips for recording (for parts A and C only)

Further tips for part C