AMTA’04
Seventh Interlingua Workshop

Determining Interlingua Utility
for Machine Translation

Saturday, October 2, 2004
9:00 am - 5:00 pm
Washington, DC, USA

 

Latest News

Workshop concluded. Check Final Program with all presentations and results of afternoon discussion.

Workshop Description

While it is agreed that interlingual transfer is the ultimate goal in Machine Translation (MT), much work still needs to be done to build interlingual representations for MT systems.  It is difficult to determine if a representation is a good one and, failing a gold standard, a useful one.  Evaluation of interlingual representations involves several levels of measurement.  The representation can be measured in ontological terms and through coverage, depth, complexity and resulting graph structure.  The representation and accompanying tools can be measured through the ability to analyze data into the representation consistently, through evaluating inter-annotator agreement.  The representation can be measured through the application of the resulting structure to a task, in this case MT.  Here, a given text is first analyzed into an interlingual (IL) representation. Then, data is generated from the IL representation, such as generating sentence output that can be compared with the original text.  Each of these evaluation strategies is complex as each involves more than one source of variation.   In this workshop, we explore the problem of evaluating interlingual representations in the MT context.  For the morning portion of the workshop, we invite submissions related to the problem of evaluating interlingual representations and the resulting text.  For the afternoon session, we encourage participation in the task presented next.

The Workshop Task

At the Fifth Interlingua Workshop, held in October 2002, the focus was on inter-coder reliability in coding thematic roles. Participants were provided with a dependency structure for each of 11 sentences. Each word was then to be assigned a thematic role from a list of thematic roles previously provided and defined by the workshop organizers.  At the Sixth Interlingua Workshop, held in October 2003, the participants marked up and compared events, objects, and states in a multilingual corpus of a UNESCO Courier article in fifteen languages (plus English).

Although participants will be invited to write a short paper for the workshop, the primary aim is to determine an upper limit on the validity of an Interlingua for translation purposes.  This year's task will involve an exercise of Manual Interlingual Translation.  There are two phases to the task: Task A(nalysis) and Task G(eneration).

Task A

For Task A, each participant is to provide four items: (1) a foreign language text, (2) one or more English translations, (3) an interlingual representation of the foreign language text, and (4) a description of the Interlingua used.  The document of interest should not be more than 300 words (English translation words that is). Participants who do not have access to parallel text for the language of their interest should contact Nizar Habash (habash@cs.columbia.edu) to help locate such text.

Task G (with report)

In Task G, participants will receive the Interlingua and Interlingua description submitted by other participants.  The result of Task G is an English translation created from the Interlingua.

Participants will provide a (joint) written report for the workshop on the process and results of their analysis and generation. These reports will be presented during the morning session of the workshop. The afternoon will be devoted to a general discussion of the task and examination of Interlingua utility, Manual Translation Quality (ala some automatic metric such as Bleu), cross-linguistic variation, and variation across multiple English versions of the same text. Pairs of participants who score the best Manual Translation Quality will receive a valuable prize and the admiration and envy of their colleagues.

Task G web page

Submission Guidelines:

For the paper-only portion of the workshop, participants should send it in Word or PDF format via email by Friday July 23, 2004 to Nizar Habash (habash@cs.columbia.edu). Include contact info for authors, title, abstract, and full text of 4-6 pages. A workshop URL will be created for the dissemination of ongoing information. [Extended to September 7th]

Accepted workshop papers will be published by AMTA, and authors will be asked to follow AAAI formatting instructions for their final copy. These instructions can be found at http://www.aaai.org/Publications/Templates/aaai.pdf and a template can be downloaded from http://www.aaai.org/Publications/Templates/Author-kit.zip. But note that the initial submission need not conform to these guidelines.

Open Task G

The open Task G is an exercise in Manual Interlingual Translation with the primary aim of determining an upper limit on the validity of an Interlingua for translation purposes.  The task involves generating from interlinguas produced by all the task A participants. Below is a table linking the samples and instructions submitted by participants in part A.  The last column specifies the minimum required text to generate from for each submission.

 

Institution-Interlingua

Sample

Genre/Language

Directions

Required Subset

CMU-IL

Technical Manual/Spanish

First 2 sentences

FSC-IL

Literary/Arabic

First sentence (third paragraphs in IL until sentence_end)

HRL-LCS

Medical/Farsi

First 2 sentences

IAMTC-IL1

Economic News/ Arabic

Sentence 1 and Sentence 2

CMU/ISI/NMSU-Pangloss-IL

Economic News/ Japanese

None

First Sentence

NMSU-IR

Business Letter/ English, Spanish, German

First two sentences

PISA-IL

Literary/Italian

First sentence (Lexemes #1 to #65)

UMBC-TMR

News

Sentence #2

UMD-LCS

News/Chinese

First Sentence

 

Please submit your English output to habash@cs.columbia.edu by September 15, 2004.

 

 

Workshop Banquet

The workshop banquet will be on the night before the workshop (since the workshop is on the day after AMTA is over).  The banquet will be held at Bistro Francais:

    http://www.washingtonian.com/dining/Profiles/BistroFr.html

The expected cost is around $50 per person total.

Please RSVP to habash@cs.columbia.edu by Wednesday September 22, 2004.

Final Program

Workshop Proceedings

 

Time

Presentation

Presenter(s)

Morning Session: Workshop Task (9:00-12:30)

9:00-9:10

Workshop Opening

Organizers

9:10-9:30

Workshop Task: Description and Results

Nizar Habash

9:30-10:30

Presentations I

 

FSC IL

Stephen Taylor

HRL-LCS

Robert Belvin

UMD-LCS

Bonnie Dorr

10:30-10:45

Break

 

10:45-12:00

Presentations II

 

 

IAMTC-IL1

Keith Miller and Lori Levin

CMU-Kantoo

Teruko Mitamura

UMBC-TMR

Sergei Nirenburg

NMSU-IR

David Farwell

CMU/ISI/NMSU-Pangloss-IL

Ed Hovy

Lunch Break (12:00-1:30)

Afternoon Session: Open Discussion (1:30-5:00)

1:30-5:00

 

Discussion Topic:

Phenomena that should be represented by an Interlingua

 

Discussion Notes (thanks to Ed Hovy)

Open to all

 

Workshop Organizers

Dr. Nizar Habash, Center for Computational Learning Systems, Columbia University. habash@cs.columbia.edu (Chair)

Dr. Bonnie Dorr, Computer Science Department, University of Maryland College Park. bonnie@umiacs.umd.edu

Dr. Eduard Hovy, Director of the Natural Language Group, Information Sciences Institute, University of Southern California. hovy@isi.edu

Florence Reeder, MITRE. freeder@mitre.org