- Who? PhD candidate in the Department of Computer Science at Columbia University. Here's a brief CV .
- Mail:
450 Computer Science Bldg
1214 Amsterdam Ave
Mailcode: 0401
New York NY 10027-7003 - Email:
- Phone: +1.212.939.7191
- Bring pizza: 724 Schapiro CEPSR
Research
I'm a member of the natural language processing group at Columbia advised by Kathy McKeown and have been involved with the machine learning group and the center for computational learning systems. My current research focuses on structured prediction approaches to problems such as text alignment and text-to-text generation. I'm particularly interested in inference strategies that incorporate multiple structural representations of text. This work is relevant to applications like summarization, question answering and machine translation.
Other projects that I'm currently involved in include approaches for automatically defining and surveying scientific concepts and graph-based models for summarizing web pages. I've previously worked on problems like redundancy reduction in text, semi-parametric density estimation, selectional preference discovery, time-series clustering, unsupervised syntactic language models, adaptive topic-based language modeling for speech recognition and semi-automated corpus annotation.
Refereed publications
-
In Proceedings of CoNLL 2013, Sofia, Bulgaria.
-
In Proceedings of COLING 2012, Mumbai, India.
-
In Proceedings of Interspeech 2012, Portland, Oregon.
-
In Proceedings of IJCNLP 2011, Chiang-Mai, Thailand.
-
In Proceedings of the Workshop on Monolingual Text-to-Text Generation at ACL-HLT 2011, Portland, Oregon.
-
In Proceedings of ACL-HLT 2011, Portland, Oregon.
-
In Proceedings of ACL-HLT 2011, Portland, Oregon.
-
In Proceedings of NAACL-HLT 2010, Los Angeles, California.
-
In Proceedings of the Workshop on Creating Speech and Text Language Data with Amazon's Mechanical Turk at NAACL-HLT 2010, Los Angeles, California.
-
In Proceedings of LREC 2010, Valletta, Malta.
-
In Proceedings of COLING 2008, Manchester, UK.
-
In Proceedings of NIPS 2007, Vancouver, Canada.
-
In Proceedings of ECML 2007, Warsaw, Poland.
Patents and other publications
- Speech Recognition with Topic-Specific Language ModelsUS Patent (pending).
- Decreasing Textual RedundancyMaster's Thesis, 2007.
-
In Proceedings of the 2007 New York Academy of Sciences Symposium on Machine Learning, New York City.
Datasets
- A corpus of phrase-based alignments derived from the Edinburgh paraphrase corpus including tokenization fixes, dependency graphs, named entity annotations and baseline alignments generated by METEOR. See Scott Martin's description and the README for more details.Download (1.6 MB) Cite
- A small corpus of pairs of related newswire sentences with multiple human-generated fusion annotations (5 intersections, 5 unions) of varying accuracy collected via Mechanical Turk users.Download (91 KB) Cite
- A collection of prepositional phrase attachment cases over unstructured blog text. Candidates were chosen automatically and final judgments were made by humans responding to multiple-choice questions on Mechanical Turk.Download (130 KB) Cite
Miscellany
The papers that I covered for my candidacy exam on text-to-text generation are available here.
My Erdős number is at most 4 but my Bacon number is still woefully undefined.