Sentence Extraction
as a classification task


Simone Teufel & Marc Moens

Language Technology Group
University of Edinburgh






1. Sentence selection and document structure


2. Learning the contribution of heuristics


[Kupiec et al. 1995]'s gold standards (target extracts): alignable summary sentences.

Alignment of summary sentences
and document sentences:


Source of abstracts: professional abstractors
Alignment rate: 79%

3. Training and evaluation


Classifier:





Trained from corpus: P(Fj) and P (Fj element S). Independence assumption


Crossvalidation results (precision/recall):

Heuristics Indiv Cumul
Location 33% 33%
Cue Phrases 29% 42%
Sentence Length 24% 44%
tf*idf 20% 42%
tf*idf + capitalization 20% 42%
Baseline 2c24%  



improvement of 74% over baseline
improvement of 29% over best single method

4. Our data


5. Target extracts and heuristics

Whole training set:




Division into three sub training sets:

\psfig {figure=/home/simone/phd/th/pictures/resultslong.ps,height=\textheight}




compression = #sentences in target extract / # sentences in document

Target extract type Compression
A 1.2%
B 3.2%
AB 4.4%
Kupiec et al. 3.0%



Heuristics:  
Cue Phrases: +3, +2, +1, 0, -1
Location: p, m, 0
Sentence length: short or long
tf*idf: good or bad score
Title: good or bad score







6. Cue phrases




2 we         argue        we argued
1 we         argue        we have argued
1 we         argue        we have argued that
1 we         argue        we will argue
1 I          argue        what I have argued is
1 we         argue        what we have argued is

3 article    article      ^this article
3 article    article      in this article

1 attempt    attempt      is an attempt to
2 I          attempt      I attempt to
2 I          attempt      I have attempted
2 I          attempt      I have attempted to
2 our        attempt      our work attempts
2 paper      attempt      the present paper is an attempt
2 paper      attempt      this paper is an attempt to



7. First experiment



Is Kupiec et al.'s methodology useful for our problem? YES




Heuristics Indiv Cumul
Cue Phrases 55.2% 55.2%
Location 32.1% 65.3%
Sentence Length 28.9% 66.3%
tf*idf 17.1% 66.5%
Title 21.7% 68.4%
Baseline 2c30.1%  




improvement of 126% over baseline
improvement of 17% over best single method

7. Second experiment


Difference between gold standards in training





\epsfig {figure=/home/simone/phd/th/pictures/data.ps,width=14cm}




Distribution of features almost identical
between gold standards

8. Third experiment







Is there a data sparseness problem?



\epsfig {figure=/home/simone/phd/th/pictures/gold.ps,width=14cm}




No data sparseness problem:

Distribution of features almost identical between 24,000 and 8,000 sentence training sets


9. Influence of precision on recall





10.Output of sentence selection step




11. Conclusions


Simone Teufel
8/18/1997