Domain-independent Single Document Summarization through Focus Analysis

Summarization can be viewed as a process of answering questions about the focus of an article. We show a two-step summarization system that automatically determines questions to ask about an article and then finds the answers in the text. The answers dictate the contents and the ordering of the resulting summary.

We show how present-day information extraction technology can be used to identify salient term types, such as people, places, organizations and technical terms, which can be the focus for a document, regardless of the domain of the document. Using several features such as frequency and term type, we can identify the foci in the text and find relationships between them that are described in the text. Our summary is based on those sentences and clauses that cover the foci and their plausible relationships.