Empirical Realization Ranking
Appears in the following Collection
AbstractThis thesis develops a new approach to the problem of indeterminacy in grammar-based natural language generation (NLG). The problem of indeterminacy concerns the fact that, for a given input semantic representation, the grammar might allow for several (i.e. thousands) alternative surface realizations. While the traditional approach to dealing with this problem is to rank the generated strings using a surface-oriented n-gram language model (LM), this thesis develops a linguistically informed approach based on features that are keyed to the internal structure of the realizations. The approach extends on the methodology previously used for statistical parsing and statistical unification-based grammars, and adapts it to the context of generation. This allows us to train treebank-based discriminative realization rankers based on modeling frameworks such as Maximum Entropy (MaxEnt) and Support Vector Machines (SVMs). The training data is based on the novel notion of a generation treebank, which we show how to automatically create on the basis of an existing parse-oriented treebank.
For reference, we also develop an n-gram-based LM trained on a large corpus of raw text. Our experimental results show that the use of a discriminative model trained on just a few thousand items in a generation treebank, gives significantly better ranking performance than the use of a traditional surface-oriented LM. Moreover, we show that even better results can be obtained by combining the two modeling approaches. This is done by including the LM as an additional feature in the discriminative model. Evaluation scores are reported for several data sets and using a range of different automated metrics. We also include results for a manual evaluation carried out by a panel of external anonymous judges.
The hybrid system for surface realization described in this thesis is currently integrated for target language generation in the Norwegian‒English machine translation (MT) system LOGON. We also show how the realization ranker is used together with a global end-to-end reranking model for selecting the final output of the MT system.