Organization of Guidelines
- Getting Started with DUCView
- Summary Content Units: Definition and Illustration
- Pyramid Annotation Instructions
- General Instructions
- Labeling Guidelines
- Pyramid Examples
- Peer Annotation Instructions
- Peer Examples
A pyramid is a model predicting the distribution of information content in summaries, as reflected in the summaries humans write (Passonneau and Nenkova 2003, Nenkova and Passonneau 2004). It has long been observed that human summarizers produce summaries that have only partly overlapping content. A pyramid model explicitly represents the overlapping content in a set of model human summaries, and indicates the frequency that models express each content unit. The resulting pyramid is used to evaluate the quality of information content in a new, distinct summary (a peer). Here we provide an overview of the pyramid annotation task, and a new set of annotation guidelines specifically for the 2006 Document Understanding Conference, organized by NIST (National Institute of Standards and Technology).
In our experience with different summary corpora, annotation issues differ when the summarization task changes, such as the number or length of model summaries. The method was initially developed on a corpus of 100-word summaries for document clusters that each contained approximately ten news articles, where the articles were approximately 500 words long. In 2005 and 2006, the document clusters being summarized each contain about twenty-five 750-word articles relevant to a topic created by NIST assessors, as outlined in the DUC 2006 task description. In both 2005 and 2006, the model summaries are 250 words long, but the 2006 pyramids will only contain four model summaries in contrast to the seven used in 2005. The 2006 guidelines given here have been revised based in part on the 2005 experience, and in part due to the use of fewer models.
All examples used here to illustrate the pyramid annotation task are taken from sample pyramids with four models drawn from the 2005 DUC data.
Back to Beginning
If you have participated in pyramid annotation previously, you can skip the section on Summary Content Units, but we urge you to read through the other sections of the guidelines before you start. If you have not previously done pyramid annotation, you can refer to the Pyramid Examples. To get started, you need the DucView annotation tool, and the file of model summaries if you are doing pyramid annotation, or the package of peer summaries, if you are doing peer annotation. See DUCView for a description of the functionality.
For pyramid creation, if you have not annotated pyramids before, we recommend that you try out a sample pyramid (see the Pyramid Examples) as a way to become familiar with the new annotation tool and the new guidelines. If you annotate part of a sample text, you can compare your annotation with the Sample Pyramids. They need not be identical, but they should be similar.
- Save the file on this download page to a file as DUCView-1.4.jar. To run from the command line, type "java -jar DUCView-1.4.jar". In a directory showing the tool, doubleclick on the file name.
- IMPORTANT: When you read in a new text file containing the model summaries, you must select the "Options" drop down menu, then select "Document Header RegEx" and enter a regular expression for the summary separator you find in the *txt file. See warning for an explanation, and the regular expression to use.
DUCView is an annotation tool that has had two uses: the creation of pyramids from model summaries, and the annotation of peer summaries against an existing pyramid. Users should note that the appearance and functionality of the interface differs depending on which of these two tasks it is being used for, as noted below.
- Drop Down Menus:
- File: For starting a new pyramid by reading in a text file containing the model summaries, or for starting a new peer annotation by first loading an existing pyramid, then reading in a new peer summary. For loading, saving, or closing the annotation files you are working on. User can also display score for peer annotations. For ease of reference, an option here also allows the user to print a list of the SCU labels.
- Edit: There are find, undo and redo functions. In addition, there is an autoannotate function for use during peer annotation that will automatically label sentences that are exact matches to previously annotated material, either in a pyramid or in a set of completed peers.
- Options: "Text Size" and "Look and Feel" are self-explanatory. During pyramid annotation, SCUs in the right pane can be dragged, either to move to a new location in the tree (e.g., to group similar SCUs together for ease of reference), or to merge two SCUs. See warning for an explanation of "Document Header RegEx," and the regular expression to use.
- Help: About DucView
- Left Pane: During pyramid annotation, displays the file of model summaries. During peer annotation, displays the peer summary.
- Warning:When you read in a new text file containing the model summaries, you must select the "Options" drop down menu, then select "Document Header RegEx" and enter a regular expression for the summary separator you find in the *txt file.
If you failed to specify a "Document Header Regular Expression" (Options menu) when you read in the file, enter it now:
- Searchable, using drop down menu "Edit > Find". Note that you can search on the model text, or on the SCU labels.
- Center Pane: This applies only to peer annotation; displays the list of SCU labels along with the SCU weights.
- New SCU: This applies only to pyramid annotation; after user selects text in the right pane, creates a new SCU in the left pane.
- Add Contributor: This button has two functions during pyramid annotation. It will add a contributor to an existing SCU, after user simultaneously selects text in the right pane, and an SCU in the left pane. For discontinuous contributors, it will add selected text to a selected contributor.
During peer annotation, select this button after selecting some text and the SCU label it matches.
- Change Label: This applies only to pyramid annotation; allows user to edit the label
- Set SCU Label: This applies only to pyramid annotation; after selecting a contributor label, hit this button to copy the contributor label to the SCU label.
- Remove: remove a selected SCU, or selected contributor.
- Order: Orders the list of SCUs by weight (descending), and within each weight, alphabetically. If you have used "Options > Dragging SCU > Moves it under target SCU" to create your own ordering, hitting the "Order" button will override it.
- Collapse: Collapses the tree of SCUs so that only the labels are visible.
- Comment: For user notes on SCUs or contributors; appears in the SCU tree as an asterisk on SCU or contributor labels; visible by mousing over the asterisk.
- Right Pane: During pyramid annotation, displays the tree of SCUs created by the annotator. During peer annotation, displays the text of the model summaries; when an SCU in the center pane is selected, its contributors are highlighted in the right pane.
Back to Beginning
An SCU is similar to a collection of paraphrases in that it groups together words and phrases from distinct summaries into a single set, based on shared content. The words selected from one summary to go into an SCU are referred to as a contributor of the SCU. As we will see from the following examples, a contributor is not always strictly a paraphrase.
The annotator must assign a label to the SCU that expresses the shared content. The label is a concise English sentence that states what the annotator views as the meaning of the content unit. Coincidentally, the SCU will have a weight corresponding to the number of model summaries that express the designated content. The SCU weight is automatically computed, based the number of summaries that contribute to it, so the annotator is not responsible for assigning weights. However, annotators should keep in mind that there should be an intuitive relation between the weight and the content: information that is key to understanding the overall topic wil often be more highly weighted, and more specific and/or more tangential information is likely to have lower weights.
The following SCU is an example from one of the sample pyramids (D633.CDEH.pyr) and illustrates a relatively straightforward case in which the contributors are each continuous strings (i.e., no discontinuities) whose meaning corresponds fairly directly to the label. If you click on the label link, you will see the same SCU, but the contributors are shown in their original sentential contexts. All four model summaries contributed to this SCU, so the weight is 4 (W=4).
Example 1: An SCU with relatively little variation across contributors
SCU 13 (W=4): Plaid Cymru is the Welsh nationalist party
C1: Plaid Cymru, the Welsh nationalist party
C2: the Welsh nationalist party, Plaid Cymru
C3: Plaid Cymru, the Welsh nationalist party
C4: Wales Nationalist Party (Plaid Cymru)
The next three SCUs from the same sample pyramid illustrate how the contributors can sometimes be less explicit than, or slightly different from, the label. The shared meaning is captured in the label. The label can be more explicit than one or more of the contributors, it can be more general, it can change the focus by altering the word order, or can differ from one or more of the contributors in other ways. However, the contributors and label of a given SCU should always be in as close a paraphrase relation as possible.
Example 2: An SCU illustrating contributors containing gaps
SCU 49 (W=4): Plaid Cymru wants full independence
Example 3: An SCU illustrating how quantities, or units of measure, can sometimes disagree and still be placed in the same SCU if it appears that the writer intended to refer to the same set as other summary writers did. The label captures the observation that the quantity in question is not required to be scientifically exact.
C1: Plaid Cymru wants full independence
C2: Plaid Cymru...whose policy is to...go for an independent Wales within the European community
C3: calls by...(Plaid Cymru)...fully self-governing Wales within the European Community
C4: Plaid Cymru...its campaign for equal rights to Welsh self-determination
SCU 77 (W=4): Wales has about 3 dozen district councils
C1: 37 districts in Wales
C2: 37 district councils
C3: 38 Welsh districts
C4: 37 district councils,
It is possible for an SCU to have a single contributor, in the case when
only one of the analyzed summaries expresses the label of the SCU. We know
from comparison of DUC 2003 and DUC 2005 data that there will be a relatively
large number of SCUs of weight one in the 2006 pyramids (Passonneau et al., 2005).
No SCU should have more contributors than there are model summaries; in
fact, the annotation tool will enforce this constraint.
Back to Beginning
- Example SCUs
- Sample Pyramids:
- Create a Sample Pyramid:
- raw text D633-CDEH.txt
- raw text D695-CEFJ.txt
- Regular expression separator:
When you read in the above file, select "Document Header RegEx" from the "Options" drop-down menu, and enter a regular expression for the string that separates the documents. For example:
You should get a message in a dialogue box saying:
"Your regular expression found 4 documents"
This will insure that DUCView knows what constraints to enforce
on the maximum weight of an SCU.
- Peer Annotation Examples
- Sample Peer Summary Annotation
- Raw Peer Summary: D633.M.250.G.10
- Annotated Peer Summary: D633.M.250.G.10.pan This is the annotation of the summary shown above, using the D633-CDEH.pyr sample pyramid.
- To Create a Sample Peer Summary Annotation: Read in the Raw Peer Summary shown above into DUCView, follow the Peer Annotation Instructions, then compare to the *pan file shown above.