Abstract: Comparing Offline Sexual Behavior to Online Discourse on Sex and Substance Use of Young Black and Hispanic Men

◆ Eugene Jang, University of Southern California
◆ Yuanfeixue Nan, University of Southern California
◆ Yunwen (Kathy) Wang, Cedars-Sinai Medical Center
◆ Christopher J. Persaud, University of Southern California
◆ Lauren Jade Arnold, University of Southern California
◆ Louise Xie, University of Southern California
◆ Katrin Fischer, University of Southern California
◆ Robin Stevens, University of Southern California

Even as HIV infection is on the decline in the US (Crepaz et al., 2018), Black and Hispanic men are among the most vulnerable for new HIV diagnoses (CDC, 2023). Black and Hispanic men’s (YBHM) risk for HIV infection is amplified by substance use (Adimora et al., 2006; Ostrow et al., 2009). Overall, we know that BHM have similar or lower rates of illicit drug use as their white counterparts (Substance Abuse and Mental Health Services Administration, 2014). Still, BHM are more likely to experience negative consequences from drug use, and the parallel risks of HIV infection and illicit drug use (Boyer et al., 2000) indicate that efforts must be made to understand motivations and attitudes related to substance use to develop effective interventions for these populations.

To fill this gap in literature, we employ a mixed-method approach (convergent design), to compare survey data with Twitter data from YBHM living in Los Angeles to examine how sexual behavior is associated with substance use and to explore if these patterns are also observed in the online discourse. The samples for both arms of the study are collected from YBHM on Twitter (now X), with shared socio-demographic profiles and neighborhoods. Twitter data was collected from July 2020 to February 2021 using the official Twitter Search RESTful API. Tweets were collected from neighborhoods with predominantly Black and Hispanic populations in the greater Los Angeles (e.g. Compton and South-Central LA) using geolocation data. Natural language processing methods and advanced classification methods were applied to identify user demographics and automate the prediction of users’ race or ethnicity (Black/Hispanic), gender (male), and age range (18-24). Likewise, detection of sexually explicit language was automated utilizing state-of-the-art transformers BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019). Our final dataset includes the complete timeline data of 4,058 Twitter users that were predicted to be likely young Black or Hispanic men living in Los Angeles (N = 7,676,707) (Weissenbacher, 2022).

The behavioral survey was conducted on YBHM living in the same predominately Black and/or Hispanic neighborhoods in the greater Los Angeles area (N = 189, mean age = 22.83), and were recruited from December 2021 to October 2022 via ads on Twitter and Instagram. Survey items included sexual health and risk behavior, substance use, and social media use.

With Twitter users’ timeline data, we conducted topic modeling through BERTopic which allows us to automatically discover and organize prominent topics within a large set of documents (Grootendorst, 2022) that YBHM tweeted about. As a preliminary analysis, we conducted topic modeling on a randomly sampled 10 percent subset of tweets (n = 767,681). Among the 6483 topics generated, trained human annotators read through 600 most frequent topics and identified distinct topics related to substance use. We found that Twitter users who engage in sexual language were also talking about substances such as weed/marijuana, cocaine, and MDMA on Twitter, although not necessarily in the same tweet. A similar finding was also reflected in the survey data. We found statistically significant correlations between substance use (i.e., weed/marijuana, non-prescribed drugs, cocaine, and ecstasy) and sexual behavior (i.e., number of sexual partners, unprotected anal sex, exchanging sex for money), which is also consistent with the literature. This Twitter data additionally reveals the discourse related to substance use motivations, attitudes, and behaviors. This study holds a significance in that it combines social media data to triangulate and contextualize associations between sexual behavior and substance use found from survey results.