A content planner is a major component in a generation system, responsible for determining content and structure of the generated output. It takes a knowledge base and communicative goals as input and provides a document plan as output. It can use content planning schemata to guide the construction of the document plan; the task of building such schemata is normally recognized as tightly coupled with the semantics and idiosyncrasies of each particular domain. In the thesis outlined in this proposal, I investigate the automatic construction of schemata from a resource consisting of texts and associated knowledge bases. This resource is a collection of human-produced texts together with the data a generation system is expected to use to construct texts that fulfill the same communicative goals. Schemata are better suited for descriptive texts with a strong topical structure and little intentional content. Thus, I focus on such domains where texts are also abundant in anchors (pieces of information directly copied from the input knowledge base). My methods involve the application of shallow understanding techniques to obtain information about the aggregative behavior of the texts. My proposed learning process involves matching of knowledge within the text, mining of order constraints on the knowledge side, and using such constraints to build the schema. Evaluation criteria throughout the process are also discussed.
Last update: Thu Mar 25 06:13:11 EST 2004