trans Natural Language Processing
Columbia University


• Home

• People

• Labs

• For Students

• Publications

• Software

• Events

• Locations

• Internal














Tools

Tools

Tools on this page are available free of charge for educational, research, and in-house uses. For information on commercial use of any of these tools, please contact Columbia Technology Ventures, email: techventures@columbia.edu, phone number: (+1) 212-854-8444.


Quoted Speech Attribution Corpus

Developed by David K. Elson
This corpus collects over 3,000 instances of quoted speech from 6 works of 19th and 20th century literature, along with annotations for the speaker (if any) of each quote among the character names and nominals present in the text. Related publication: Elson and McKeown, Automatic Attribution of Quoted Speech in Literary Narrative, AAAI 2010. This material is based on research supported in part by the U.S. National Science Foundation (NSF) under IIS-0935360. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.

MADA

Developed by Nizar Habash and Owen Rambow
A full morphological tagger for Modern Standard Arabic

LCseg

Developed by Michel Galley
A domain-independent discourse segmenter based on lexical cohesion.

LexChainer

Developed by Michel Galley
A tool to find semantically related words within unrestricted texts.

LinkIT

A tool for identifying and relating noun phrases within a document.

Centrifuser

Developed by Min-Yen Kan
Centrifuser is a domain- and genre-specific multidocument summarization system. It builds both extract based summary as well as indicative document cluster summaries. The extract summary gives a high level overview of the query topic suitable for browsers. The indicative document cluster summaries differentiate the documents from each other as much as possible to route users to particular documents that can meet their underspecified information needs. Centrifuser was developed as part of the NSF's DLI 2 initiative and focuses on patient health care documents.

Annotated Bibliography Corpus

Developed by Min-Yen Kan
We have collected 2000 annotated bibliography entries from the web and put them into a standardized XML format. We have further annotated 100 of these entries with semantic tags that discuss the types of document-derived and metadata features that play a role in these summaries. Annotated bibliography entries are a good source for doing research on corpus-based summarization; as they provide information about what to include and how to write and stylize indicative summaries.

FUF

Developed by Michael Elhadad
FUF stands for Functional Unification Formalism.

CFUF

Developed by Michael Elhadad and Mark Kharitonov
CFUF is A graph-based implementation of the FUF language implemented in C and embedded within a Scheme interpreter.

Surge

Developed by Michael Elhadad and Jacques Robin
Surge is a syntactic realization grammar for text generation.

CREP

Developed by Duford
CREP is a regular expression finder for linguistic patterns.

Segmenter

Developed by Min-Yen Kan
Segmenter is a Text Segmentation program.

Verber

Developed by Min-Yen Kan, Judith Klavans and Kathleen McKeown
Verber is designed to conflate semantically related verbs together.

webmaster - wm2174x[at]xcolumbia.edu last updated - 08.14.2013