top of page
Abstract

Textual response generation for multimodal task-oriented dialog systems, which aims to generate the appropriate textual response for the given multimodal context, is a pivotal yet challenging task. Although existing studies have shown fruitful progress, they still suffer from two critical limitations: 1) ignore the relation knowledge, and 2) lack the representation-level regularization. To address these limitations, we propose the novel multimodal dialog system (HRMD) with dual semantic knowledge composition and representation-level semantic regularization.
Specifically, HRMD first acquires the related attribute knowledge and relation knowledge for the given multimodal context from the knowledge base, where the non-intuitive relation knowledge is extracted by the n-hop graph walk.  Subsequently, HRMD composes the multimodal context and acquired dual semantic knowledge to obtain the latent composed response representation, where the attribute and relation knowledge are composed at the input token level and the intermediate representation level, respectively. Moreover, apart from the common output-level supervision  for the response generation, HRMD also incorporates the representation-level semantic regularization, where a set of latent query variables are devised to absorb the semantic information from the composed response representation and the ground truth response representation, respectively, and hence fulfill the representation-level regularization. Notably, this regularization allows HRMD to further utilize the composed response semantic representation to enhance the response generation. Extensive experiments on a public dataset verify the superiority of our proposed HRMD. Meanwhile, we release the  codes and parameters to facilitate the research community.

Framework
Figure_Model-7_v5_9_00.jpg
Resources
Copyright

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.​

The code: [Download](password:vowo)
The test output file: [Download](password:aoyp)

bottom of page