********************SAMPLE MIDTERM ONLY*****************************

Short Answers (provide a 2-3 sentence answer for 5 of the following
8): (5 points each = 25 points)

a) What is the Penn Treebank and why is it important.

b) Explain how you might decide whether 'child' is a noun or an adjective
in "a child seat".

c) What is the difference between an adjunct and an argument in a
Dependency Tree Grammar?  Give examples of each.

d) Give two examples each of mass and count nouns and give one example of
a noun that is both.  

e) Distinguish between phones, phonemes, and allophones and explain why
the distinction is important. 

f) Place of articulation

g) Viterbi algorithm

h) Explain the difference between entropy and perplexity


2) Exercises (Do 3 of the following 5): (15 points each = 45 points)

a) List as many ways as you can think of to convey the time expression
"3:45 p.m." in English.  Draw a finite state automaton (or state
transition table) for an automaton that recognizes these.  Draw a
finite state transducer that will translate these into 24 hour time.

b) Calculate the MED between the source word "brang" and each of two
candidate corrections, "strange" and "blanch".  Show the MED table for
each.  Use the Levenshtein distance with cost of 2 for substitutions
and 1 for insertions or deletions.

c) Draw an FSA that represents the language /ba+!/.  Create an FSA
which translates it into the language /mo+?/.

d) Write a grammar that covers the following fragment of English and
construct a left-corner chart for this grammar:

John gave a book to Mary.
Give them a book.
A book was given to John.
They gave a very expensive book.
The book was very expensive.

e) Calculate the bigram probability of the sentence "I want a British
lunch", given the following unigram and bigram counts:

Bigram counts:

       <s>	I	want	a	British	lunch
       
<s>    0	859	357	452	22	57	

I      0	8     1087	0	0       10

want   0	3     0		60	75      120

a      0	0     0		2	55	205

British 0	0     10	0	3	72

lunch	0	4     0		0	4       0

Unigram counts:

<s>	10,000

I	3437

want	1215

a	6342

British	 359

lunch	2768 


3) Short answer (Provide ~1 paragraph on 2 of the following 4): (15
points each = 30 points)

a) Describe the training process for Brill's TBL part-of-speech
tagger.  Will sentences like "Time flies with a stopwatch" and "Time
flies like an arrow" in the training data be a problem for Brill's
approach?  Why or why not?  

b) You have just been hired to work on the pronunciation module for
the Really Dumb Text-to-Speech System.  The RD TTS system in
particular can't pronounce proper names like "Anton Chekov" ,
"Infiniti", or "antidisestablishmentarianism".  What techniques can
you recommend that would improve this system?

c) Describe the major features of the Earley algorithm.  What are its
strengths and weaknesses?

d) Compare and contrast two of the smoothing procedures described in
Jurafsky and Martin.