LSA 7800-074: Text-to-Speech Synthesis, Summer 2011

Time: Tues/Fri 10:30-12:15
Place: ECCE 152

Professor Julia Hirschberg (Office Hours TBD)

Announcements | Academic Integrity | Description
Readings | Resources | Requirements | Syllabus


Text-to-Speech synthesis (TTS) is the technology behind the speech generation found in most Spoken Dialogue Systems. The goal of TTS research is to produce speech that sounds as natural as speech a human would produce -- using only text as input. In this class, we will explore the different components of current TTS systems, including text analysis, pronunciation assignment, intonation assignment, and speech realization, and how many of these might be improved using more linguistic knowledge. We will examine existing commercial systems and develop evaluation procedures for them. Students will work in pairs to build simple TTS systems of their own from Festival TTS components.


Students should have a basic knowledge of one scripting language (e.g. Perl, Python) or find another class member with such knowledge to partner with.  

Academic Integrity

Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work.

Required texts:

    Daniel Jurafsky and  James H. Martin Speech and Language Processing (second edition). Pearson: Prentice Hall. 2009.  See errata before you do each reading assignment in case there are updates.  Chapter 8.  Online version is here.

    Other required readings are available online via links from this syllabus.


  • Course Project and class participation.





Reading  Assignments

 HW Due Dates and Other Assignments

July 8

Introduction and Speech Generation Overview

 J&M 8 (pp. 249-50, 281-84); TTS-history; Historical examples

J&M 8 pp. 281-284


July 12

Text Normalization

J&M 8.1, Sproatetal01

Project assignment: Option 1

Option2 -- Festival-for-Mac


July 15

Modeling Pronunciation

J&M 8.2; Ghoshaletal09


July 19 Building a TTS System and TTS Evaluation Black-Festival-Notes*; J&M 8.6  

July 22

Prosody Modeling

Hirschberg03; J&M 8.3.0-8.3.6;ToBI labeling conventions

July 26 Predicting Prosody from Text 8.3.7  

July 29

Information Status: Focus and Given/New

GBrown83, Prince92, Terken&Hirschberg93


Aug 2

Backend Synthesis

J&M &M 8.4-5, 8.6 Tokuda35al02

Projects due


Links to Resources

cf. also resources available from the text homepage

Places to look up definitions and descriptions of terminology:

  1. Oxford Dictionary of Linguistics
  2. Interesting Language Factoids and Non

Other resources

  1. Karen Chung Language and Linguistics links
  2. CatSpeak
  3. On-line dictionaries in many languages.
  4. Festival speech synthesizer demo and links to other TTS systems
  5. Julia Hirschberg's Intonational Variation in Spoken Dialogue Systems tutorial

Julia Hirshberg Portrait

Julia Hirschberg
Professor, Computer Science

Columbia University
Department of Computer Science
1214 Amsterdam Avenue
M/C 0401
450 CS Building
New York, NY 10027

email: julia@cs.columbia.edu
phone: (212) 939-7114

Download CV

Columbia University Department of Computer Science / Fu Foundation School of Engineering & Applied Science
450 Computer Science Building / 1214 Amsterdam Avenue, Mailcode: 0401 / New York, New York 10027-7003
Tel: 1.212.939.7000 / Fax: 1.212.666.0140