This approach is applied in the summary of Japanese news articles.
The summaries consist of about 30% of the original text. On average,
this method extracts 50% less text than the simple title-keyword
method.
Taking a newspaper article and a base corpus, word co-occurrences with
higher resolving power are identified. These co-occurrences are used to
establish links between the paragraphs of the article. The paragraph
which presents the larger number of links to other paragraphs is
considered a most significant one.
Though designed and tested for the Portuguese language, the statistical
nature of our proposal should ensure its portability to other languages.
We report on a replication of this experiment with different data:
summaries for our documents were not written by professional
abstractors, but by the authors themselves. This produced fewer
alignable sentences to train on. We use alternative `meaningful'
sentences (selected by a human judge) as training and evaluation
material, because this has advantages for the subsequent automatic
generation of more flexible abstracts. We quantitatively compare
the two different strategies for training and evaluation (viz.
alignment vs. human judgement); we also discuss qualitative
differences and consequences for the generation of abstracts.
[Slides]
[Slides(RTF)]
[Slides]
[Paper-Postscript]
[Paper-HTML]