John's Research Page


I am interested in natural language processing and machine learning. Here is some information about my research.

Current Focus: Multi-document Text Summarization

Columbia Newsblaster is a multi-document summarization system that crawls the web each day for thousands of news articles. It then categorizes, clusters, and summarizes them.

I am looking at extending Newsblaster. We would like to have Newsblaster track important events across days. We would also like to have Newsblaster process not only news articles, but also other kinds of information.

Surface Realization

FERGUS is a surface realizer. Its job is to determine what is the best way to say something, after the system has decided what must be said to the user.

I have investigated methods to ease porting of FERGUS for use in different domains. This includes the use of automatically generated linguistic resources in order to train FERGUS. This also includes the development of a graphical user interface to customize FERGUS.

In other experiments, I have looked at the difference betweeen using enormously-sized, automatically parsed corpora versus moderately-sized, hand-annotated corpora in order to train FERGUS. I have also investigated using linguistically inspired features in order to make FERGUS do a better job.

Automated Extraction of Tree-Adjoining Grammars from Treebanks

A grammar is a set of rules which, among other things, distinguishes grammatical from ungrammatical sentences in a language. A tree-adjoining grammar is a kind of grammar which has been found not only computationally useful, but also linguistically useful.

Development of hand-written tree-adjoining grammars typically requires many years of human effort. As an alternative, I have written procedures that automatically extract linguistically plausible tree-adjoining grammars from a given treebank.

Supertagging

Parsing is the task of assigning the most appropriate parse tree for a given input sentence. Once the computer knows what that parse tree is, it can more easily figure out the meaning of the sentence. A perennial problem is figuring out how to perform parsing efficiently and accurately.

Supertagging has been proposed as one technique in order to accomplish exactly that. It is a preprocessing step that chooses a tag to go with each word of the input sentence, These tags can significantly reduce the search space for a parser. I have looked at ways to make supertagging more accurate without sacrificing its efficiency. I have also investigated class-based tagging as a viable variant of supertagging.

Semantic Parsing

Semantic parsing is the task of computing the "meaning" of a given input sentence. The form of this meaning depends upon the kind of semantic representation that is being assumed. Here we assume local semantics, which means that the semantic role labels that are assigned to each predicate's arguments are consistent across syntactic alternations of different instances of the same predicate, but not necessarily across different predicates. We find that the use of deep syntactic features makes it easier to predict local semantics than the use of surface-oriented syntactic features.