Scalable Text Summarization: Wrap-up

Mark Maybury

http://www-i.mitre.org/resources/centers/advanced_info/mark.html

0 Pressing need
0 Importance of (intelligent) cross-fertilization 
- resources such as corpora, lexical resources/ontologies
- processes such as text segmentation, POS tagging, RST analysis ...
0 Many potential tasks and applications:  document creation, search, preview, 
substitute ...
0 Commercial applications (Searchable Lead, Oracle's Context, Microsoft summarizer)
0 Interesting new challenges:
-  muti-document
- multi-lingual
- multi-media!

Key Issues

0 Input Source features:  form (length, genre (summary itself)) structure, unit
0 Purpose/Tailoring summary: authoring, retrieval, summary
Necessary knowledge sources (what type of features? How important? Can we compute 
these? When should we aggregate?), e.g., domain expert abstractors do better job  BUT 
more time spent on task not necessarily increase quality?
0 Unit of analysis:  word, phrase, clause, sentence, para
0 Level of representation:  word frequencies/vectors, "lexical-chains", topics, 
predicate/argument, rhetorical structure
Selection methods: location/document structure, most frequent/linked words, cue phrase/ 
rhetorical structure, corpus-based, combined
0 Presention Generation
0 Output 
- units and order:  word, named entity, subject term, phrase, sentence, para
- access method, e.g., hypertext summary to source, interactive
0 Methods:  statistically trained, knowledge based (hand coded) features, cognitively-
motivated
0 Evaluation:  
- Sources (e.g., Federal register)	
- metrics (accuracy, speed, comprehensive, readability, user satisfaction)
- methods (summary to source, machine to human generated, system to system)

Future Plans

0 Workshop:  
- http:7/31/97/www.cs.columbia.edu/~radev/ists97/
- lides and/or papers
- Bibliographies
- Corpora
- Tools
0 AAAI Spring Symposium,Stanford, 23-25 March 1998
0 Opportunity to evaluate systems 
0 Edited Collection