Abstract: Variation in Media Coverage of Tobacco affects Self-Reported Scanning: Evidence from Three Years of Weekly Content and Survey Data

◆ Kwanho Kim, University of Pennsylvania
◆ Robert Hornik, University of Pennsylvania
◆ Laura Gibson, University of Pennsylvania

The persuasiveness of causal claims made about the effects of media exposure on behavior based on survey data can be undermined if both exposure and outcome variables are measured in a common survey instrument. Methodologists describe this as a concern about endogeneity- the risk that temporal order between variables is uncertain, and that an observed association may merely reflect the effects of confounding variables. One proposed solution is to make the measurement of exposure independent to the measure of outcome, specifically by assessing exposure indirectly by content analysis of the media environment. This solution assumes that as the media environment varies in its content, over time or over places, individual exposure to that content will vary as well (Kelly et al, 2009; Niederdeppe, 2014; Liu & Hornik, 2016). The study reported here tests this assumption.
We collected, every day, between mid-2014 and mid-2017 both long-form texts (from broadcast TV and radio news transcripts, the AP newswire, 50 popular U.S. newspapers, and more than 100 websites popular among young people) and all tobacco-relevant tweets. Over this period, we collected a total of 125,165 long form texts and about 51 million tweets. The texts were located using a combination of dictionary and supervised machine-learning text analysis tools (Gibson et al., 2019). During the same period, we also collected a weekly rolling cross-sectional and nationally representative phone survey data which eventually included 11,847 U.S. youth and young adults. The survey instrument contained a question asking how frequently the respondents have come across smoking-related information (called scanning): “In the past 30 days, did you come across information about cigarettes or tobacco online, in the media, or from other people even when you were not actively looking for it?” (if yes) “Did you come across such information once or twice, three to ten times, or more times than that?” The mean for this measure, calculated using midpoints of categories (e.g., once or twice = 1.5), was 2.15, with a standard deviation of 3.15.
We examined whether media coverage (long-form and Twitter) about tobacco products from the content analysis predicted survey-reported scanning of smoking-related information during the given period. Each respondent was assigned content scores reflecting the average volume of long-form (mean = 2191, SD = 314) and Twitter coverage (mean = 44194, SD = 9637) about tobacco products over the previous 28 days. The results from regression analysis, clustered for the 1,140 interview dates, with both content variables standardized, indicated that the volume of Twitter coverage about tobacco products other than electronic cigarettes was positively associated with the frequency of scanning smoking-related information (B = .045, robust standard error = .016, p < .01). The long-form coverage was not correlated with frequency of scanning (B = -.008, robust standard error = .007, ns). Our findings provide mixed evidence that content analytic estimates of variation in media coverage predicts variation in survey measured self-reports of exposure. Further considerations will include questions about possible thresholds for effects and the match between content analyzed and respondent sample populations.