Abstract: Predicting Affective Responses to Health Ads Using Computer-Extracted Musical Emotions: An Exploratory Study

◆ Soyoon Kim, University of Miami
◆ Ching-Hua Chuan, University of Miami

One of the quintessential features that shape contemporary health messages is music. It is widely accepted that music has a strong influence on the listeners’ emotional states [1]. However, relative to the significant attention paid to the visual characteristics of health messages associated with emotional responses, cursory auditory features, such as the presence or absence of background music (BGM) [2]-[3], have been identified and examined. While useful, such an approach cannot tell us much about which parameters in the music contribute to its intrinsic emotional characteristics. On the contrary, in different domains, such as music cognition or affective computing, researchers have begun to predict human emotions based on a set of built-in musical features extracted from the audio, using computational approaches [4]-[5]. While this approach can provide a useful tool for health communication researchers, the two fields have yet to meaningfully intersect.
In this study, we took an interdisciplinary approach to examine to what extent an individual’s affective responses to a health ad can be accurately inferred via computer-extracted musical emotions—i.e., the valence and arousal of the ads’ BGM. We also explored what are other features of the ads that may influence the retrospective emotional responses (un) predicted by the computed musical emotions. First, computer-estimated emotion ratings on music (activity, tension, and valence) were obtained using MIRtoolbox algorithm developed based on various parameters in film soundtracks [4]. Using the model, we generated the computer-estimated arousal and valence ratings from the BGM of nine televised health ads. Second, each of the ads were rated by 25 participants to obtain self-reported valence and arousal scores. Third, other features of the ads were human coded based on a systematic scheme developed by message sensation value research [2], [6]. After we obtain all the scores and features, we mapped each ad on a two-dimensional space with self-reported arousal and valence, and visualized the computer- estimated valence (represented by cool-to-warm tone colors) and arousal score (represented by the size of circles) for each ad to visually assess their relationships.
The visual investigation indicated that while the self-reported arousal responses were well predicted by the computer-extracted arousal of BGM, none of the self-reported valence scores were predicted by the computer-extracted valence. Moreover, ads with positively-valenced BGM were mapped onto areas with low self-reported valence scores, indicating potentially negative associations between them. Further examination of the human-coded ad features suggested that, in many ads demonstrating such negative associations, the BGM were used together with visual or content features, such as intense imagery, a twist at the end, and/or threatening taglines, which might be responsible for the negative emotion elicited by the ads. This suggests the importance of considering other message features that may influence emotional responses independently or in combination with the musical aspect of the message in a future modeling process. Despite the explorative and descriptive nature, we believe this study is a meaningful first step in advancing conventional approaches used in health-communication research to predict the effects of health ads based on audio-visual features.