CS4706 - Spring 2008
Homework 3 - "Data Collection"


Due: Wed, Mar 3, 2010, by 2:40pm.
Submit in Courseworks

IMPORTANT: Submit all your files in one compressed file named "hw3-YourUni.zip" Example: "hw3-fb2175.zip".

 

I.      Collect 4 tokens of Barack Obama's speech, from different genres, e.g. a stump speech, a one-on-one-interview, a debate, an infomercial -- you choose.


Save each token to a wav file indicating the genre (e.g. barack-debate.wav)


With the following constraints:

 

a.                         Each token should be ~10sec long, sampled at 16khz.

 

b.                         The speech should be as clean as possible so that you can get a good pitch track when Obama is speaking, so try to avoid overlapping applause and other speakers.

 

II.      In a README file, you should annotate the following for each utterance:

a.      Filename

b.   Genre

c.   Where and when was the utterance made?

d.       What is *your own* native language (i.e., the first language you learned as a baby)?

 

III.      You should provide a full ToBI labeling for each utterance (orthographic tier, tonal tier (including HiF0), break index tier, and miscellaneous tier), including, on the miscellaneous tier, time marks for all disfluences in the speech, if any, and also indicate coughs or laughs. (Provide 8 textgrid files: e.g, Barack -1.TextGrid)

 

**In short, your submission should include 4 wave files, 4 TextGrid files, and a README.