Neural Aspect-based Text Generation

Research Area: Machine Learning

Abstract:

Advances in neural sequence models and large-scale pretrained language models have made a great impact on natural language generation, achieving impressive performance on different tasks. However when users of such systems have a specific interest in what content to generate, these models fail to address such needs.To control the content of the generated text more accurately, one could specify an aspect of interest, a semantic property of a current topic that specifies a particular subset of content, and perform aspect-based generation.For example, a user maybe interested in knowing more about price from the collection of product reviews.Despite the potential utility of such methods, aspect-based generation has received relatively little research attention. One of the reasons is the lack of available data resources to train and apply models in a variety of domains. In addition, what little work exists on aspect-based generation describes these aspects in simple forms: a set of labels that specifies the overall subtopics in the generated text. However, in reality the content of a text should be specified with different granularities and with respect to other aspects of interest when necessary.In this thesis, we propose methods to address these issues, allowing for better control of the generated texts. This thesis consists of three parts. First, we address the lack of domain diversity in data sources for aspect-based summarization by re-formulating Wikipedia article generation as multi-document aspect-based summarization. We examine the summarization performance on 20 domains and highlight domain-specific challenges. Leveraging this data set, we then explore zero-shot do-main transfer of aspect-based summarization models, with models capable of handling arbitrary aspects at testing time. Next, we focus on incorporating structures in to aspects. In this part, we turn our focus to scientific survey articles, which areorganized by human authors using section structures. Using this structure, we formulate survey article generation as a summarization task and investigate the use of structural prompts for aspect-based summarization. In the last part, we aim to achieve fine-grained content control with fine-grained aspects. In particular, we learn a language model on Wikipedia texts about single entities by conditioning on a local knowledge base that stores information about various aspects of the entities.We design a model capable of automatically switching between token-by-token and aspect-based generation based on the likelihood of the text.

Name of the Researcher: Hiroaki Hayashi

Name of the Supervisor(s): Graham Neubig,Yulia Tsvetkov

Year of Completion: 2021

University: Carnegie Mellon University

Thesis Link: Home Page Url

Office Address

Social List

Trending PhD Thesis on Neural Aspect-based Text Generation

Abstract:

S-Logix (OPC) Private Limited