Simone Teufel & Marc Moens
Language Technology Group
University of Edinburgh
1. Sentence selection and document structure
summary:
human written
abstract:
machine generated
excerpt:
collection of sentences
2. Learning the contribution of heuristics
[Kupiec et al. 1995]'s gold standards (target extracts): alignable
summary sentences.
Alignment of summary sentences
and document sentences:
Source of abstracts: professional abstractors
Alignment rate: 79%
3. Training and evaluation
Classifier:
Trained from corpus: P(Fj) and P (Fj element S). Independence assumption
Crossvalidation results (precision/recall):
| Heuristics | Indiv | Cumul |
|---|---|---|
| Location | 33% | 33% |
| Cue Phrases | 29% | 42% |
| Sentence Length | 24% | 44% |
| tf*idf | 20% | 42% |
| tf*idf + capitalization | 20% | 42% |
| Baseline | 2c24% |
improvement of 74% over baseline
improvement of 29% over best single method
4. Our data
5. Target extracts and heuristics
Whole training set:
Division into three sub training sets:
compression = #sentences in target extract / # sentences in
document
| Target extract type | Compression |
|---|---|
| A | 1.2% |
| B | 3.2% |
| AB | 4.4% |
| Kupiec et al. | 3.0% |
| Heuristics: | |
|---|---|
| Cue Phrases: | +3, +2, +1, 0, -1 |
| Location: | p, m, 0 |
| Sentence length: | short or long |
| tf*idf: | good or bad score |
| Title: | good or bad score |
6. Cue phrases
2 we argue we argued
1 we argue we have argued
1 we argue we have argued that
1 we argue we will argue
1 I argue what I have argued is
1 we argue what we have argued is
3 article article ^this article
3 article article in this article
1 attempt attempt is an attempt to
2 I attempt I attempt to
2 I attempt I have attempted
2 I attempt I have attempted to
2 our attempt our work attempts
2 paper attempt the present paper is an attempt
2 paper attempt this paper is an attempt to
7. First experiment
Is Kupiec et al.'s methodology useful for our problem? YES
| Heuristics | Indiv | Cumul |
|---|---|---|
| Cue Phrases | 55.2% | 55.2% |
| Location | 32.1% | 65.3% |
| Sentence Length | 28.9% | 66.3% |
| tf*idf | 17.1% | 66.5% |
| Title | 21.7% | 68.4% |
| Baseline | 2c30.1% |
improvement of 126% over baseline
improvement of 17% over best single method
7. Second experiment
Difference between gold standards in training
Distribution of features almost identical
between gold standards
8. Third experiment
Is there a data sparseness problem?
No data sparseness problem:
Distribution of features almost identical between 24,000 and 8,000 sentence training sets
9. Influence of precision on recall
11. Conclusions
Simone Teufel