Julia Hirschberg

CS4706: Spoken Language Processing, Spring 2012

Time: Mon/Wed 2:40-3:55
Place: Seeley Mudd 233

Professor Julia Hirschberg (Office Hours M 4:15-6:15 pm)
julia@cs.columbia.edu, 212-939-7114

Teaching Assistants

Rivka Levitan (Office Hours TBD) rlevitan@cs.columbia.edu, 212-939-7147

Erica Cooper (Office Hours TBD) ecooper@cs.columbia.edu, 212-939-7122

Description

This course introduces students to research in spoken language in computational linguistics, aka natural language processing (NLP). We will study the different `meanings' that can be conveyed by the way that speakers produce sentences, techniques for analyzing spoken language, methods of developing speech technologies such as text-to-speech systems and speech recognition systems, and applications of speech technologies in the real world, such as spoken dialogue systems (SDS). Students will build an SDS in a domain of their choice, working in small teams. NB: This course can be counted as a PhD elective in Advanced AI. It is a requirement for the MS NLP Track. There are no official prerequisites for this course except Data Structures or equivalent, and no prior knowledge of NLP will be assumed.

Requirements

The major requirements of the course are a midterm, a final, and a 3-part class project. Class participation will also contribute to your final grade. The project involves building a spoken dialogue system in a domain of your choice. You will build a text-to-speech (TTS) system and an automatic speech recognition (ASR) system from components we will provide; the dialogue component will involve building a simple system to put inputs and outputs together to accomplish some interesting and useful or fun task. You are encouraged to do these projects in teams of 2-3. There will be several project deadlines during the term where we evaluate your project description, your TTS system, your ASR system, and the overall project. Project deadlines will be allowed total of 5 late days with no questions asked; after that, 10% per late day will be deducted from the grade for that component, unless you have a note from your doctor. Do not use these up early! Save them for real emergencies.

All students are required to have a Computer Science Account for this class. To sign up for one, go to the CRF website and then click on "Apply for an Account". The Speech Lab is available for use in homeworks as needed on a signup basis. Some parts of the project must be done in the Lab.

Academic Integrity

Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you believe you are going to have trouble completing an assignment, please talk to Prof. Hirschberg or to Robert Coyne in advance of the due date. Please see the university policy.

Required texts:

Daniel Jurafsky and James H. Martin Speech and Language Processing (second edition). Pearson: Prentice Hall. 2009. See errata before you do each reading assignment. There are some typos in algorithms.

Other required readings are available online via links from this syllabus.

Grading:

40% Exams

60% Course Project

Class participation will be taken into account in calculating the final grade.

Homework and project submission procedure is described HERE.

Lab Signup.

Sign-up to use the Linux computers in the Speech Lab. .

Resources

· Sox - audio file editing

· Help using ToBI - ToBI Annotation Environments

· Text-to-Speech Links and more...

· Praat - Praat resources

Syllabus

C	Topic	Reading Assignments	HW Due Dates and Other Assignments
Jan 18	It's not what you said, it's how you said it [pdf]
Jan 23	From Sounds to Language [pdf]	J&M 7.1-7.3, 7.5
Jan 25	Acoustics of Speech [pdf]	J&M 7.4
Jan 30	Tools for Speech Analysis [pdf]	Praat tutorial 1	Project Description due. Download Praat to your laptop if you have one and bring to class with headphones if you have.
Feb 1	More on Praat and Lab Visit		"
Feb 6	Speech Generation Overview [pdf]	J&M 8 (pp. 249-50, 281-84); TTS-history; Historical examples
Feb 8	Building a TTS System [pdf]	Black-Festival-Notes	Project Part 1 (TTS) assigned
Feb 13	Text Normalization [pdf]	J&M 8.1, Sproatetal01
Feb 15	Modeling Pronunciation [pdf]	J&M 8.2; Ghoshaletal09
Feb 20	Prosody Modeling [pdf]	Hirschberg03, J&M 8.3.0-8.3.4, ToBI labeling conventions	Download and listen to all the ToBI examples. Prepare these exercises and bring them to class with your laptop and headphones.
Feb 22	Predicting Prosody from Text [pdf]	J&M 8.3.4-8.3.7
Feb 27	Information Status: Focus and Given/New [pdf]	GBrown83, Prince92, Terken&Hirschberg93
Feb 29	TTS Evaluation [pdf]
Mar 5	Backend Synthesis [pdf], HMM Synthesis [pdf]	J&M &M 8.4-5, 8.6 Tokuda35al02	Project Part 1 due Project Part 2 (ASR) assigned
Mar 7	Midterm		NB: Please deposit the exercises you did for Feb 21 in Courseworks before class.
Mar 12-16	Spring Break
Mar 19	ASR: Overview [pdf]	J&M 9-9.2, 6-6.3
Mar 21	Building an ASR System	J&M 9.3-9.7	Fadi Biadsy
Mar 26	Language Modeling and Grammars [pdf]	J&M 4, 9.5
Mar 28	ASR Evaluation [pdf]	J&M 9.8
Apr 2	Human Speech Perception [pdf]	J&M 10.7	Project Part 2 due Project 3 (SDS) assigned
Apr 4	Metadata: Speaker, Sentence and Topic Segmentation and Disfluencies [pdf]	J&M 10.5, Liuetal04, Liuetal05, Snoveretal04
Apr 9	Spoken Dialogue: Human and Machine [pdf]	J&M 24-24.1, 24.8
Apr 11	SDS System Architectures	J&M 24.2-3, Goldberg03
Apr 16	Managing Interaction [pdf]
Apr 18	Dialogue Acts and Information State	J&M 24.5, Hirschbergetal04
Apr 23	Dialogue Acts and Information State (2)
Apr 25	SDS Evaluation [pdf]	J&M 24.4, Walkeretal97	Preliminary Project Demos
Apr 30	Final Exam
May 1-3	Study Days
May 4-11	Project Demos (1:10-4)	Interschool Lab, 750 CEPSR	Project Part 3 due

Links to Resources

cf. also resources available from the text homepage

Places to look up definitions and descriptions of terminology:

Other resources

Karen Chung Language and Linguistics links
CatSpeak
Check out Eliza
AT&T Labs - Research Finite State Machine Library
Appelt and Israel's information extraction tutorial (IJCAI-99).
Framenet.
Ask Jeeves-- a search engine that answers questions in plain English.
Answer Bus -- another Q/A system.
Columbia's NewsBlastersummarizer
IBM summarizer demo (canned)
Systran machine translation (also in use at Babelfish)
AT&T Labs - Research Finite State Machine Library
Michael Collins' Parser
On-line dictionaries in many languages.
WordNet
Framenet
CoBuildDirect Corpus
AT&T's SCANMail voicemail browsing/search system
DiaLeague 2001 -- includes a link to an online dialogue system demo.
James Allen's Dialogue Modeling for Spoken Language Systems ACL 1997 Tutorial
Festival speech synthesizer demo and links to other TTS systems
Julia Hirschberg's Intonational Variation in Spoken Dialogue Systems tutorial

Julia Hirshberg Portrait

Julia Hirschberg
Professor, Computer Science

Columbia University
Department of Computer Science
1214 Amsterdam Avenue
M/C 0401
450 CS Building
New York, NY 10027

email: julia@cs.columbia.edu
phone: (212) 939-7114

Download CV