Separating the evaluation into components and developing measures for each component. Metrics that could be obtained for individual output instances include content/fidelity to the source information, fluency, grammaticality, style. There are also metrics that would apply to a system/task, such as coverage, variability, development time. Additional ad hoc measures of the kind Matthew mentioned can be useful, but I will argue that some standardization of the measures, and the identification of different aspects of output quality, will facilitate objective measurements of progress. Part of the goal would be to reduce, quantify, and control the effects of any human evaluators. Another aim would be to have a comparison basis for measuring the benefits offfered by statistical generation techniques over traditional methods. Integration of statistical and knowledge-based models. I believe there is the opportunity to go beyond the initial statistics-does-all approach, by judiciously re-inserting traditional generation knowledge models and mechanisms in the process. I will give a couple of examples where such knowledge would help a statistical model, and where obtaining the same performance within the statistical paradigm alone would be less efficient than the integrated approach.