INTRODUCTION

Current research in the United States indicates that young adults (18–24 years) have the highest prevalence of using any type of tobacco product (37.6%)1,2. Young adults are at risk of using conventional tobacco products (e.g. cigarettes), new and emerging tobacco products (e.g. e-cigarettes), and multiple products at the same time3-5. In particular, young adults in community college represent an underserved population that is more likely to use tobacco than young adults attending 4-year level universities6-8. Tobacco use among young adults can be attributed to the relatively lower knowledge and perception of the risks of tobacco, when compared to other adult age groups9-16. As a result, there is a need to engage in campaigns that communicate to young adults about tobacco risks17,18.

The dissemination of text messages via mobile phones is a growing strategy for tobacco-risk communication9,19. Considering that 96% of American young adults own mobile phones capable of receiving texts, text messaging is a particularly appropriate method for transmitting information to this population20,21. While this strategy has been successful22-24, little has been done to design validated text messages for tobacco-risk communication. Furthermore, a validated library of text messages has yet to be reported in the literature.

Researchers have constructed text messages for risk communication based on three main structures: framing (gain-framed or loss-framed messages), depth (simple or complex messages), and appeal (emotional or rational messages)25-28. Compared to depth or appeal, developing text messages based on framing has been relatively easy due to the well-established conceptualization of gain- and loss-framing29,30. In short, gain-framed messages emphasize the benefits of quitting or avoiding substance use, while loss-framed messages emphasize the costs of use. Such a conceptualization of framing has been consistently applied by researchers in health promotion and disease prevention31.

Researchers face challenges when developing messages based on depth and appeal. The diversity of message features that depict depth and appeal makes it difficult to effectively construct a text message. In the context of depth, message complexity has been defined in many ways, based on message structure, content, or both32. While structure may involve complex grammatical applications, content can include longer words33-35. Research suggests that message complexity can have an effect on tobacco-risk communication, and the success of such messages depends on an individual’s level of need for cognition36. Individuals with low need for cognition are more likely to express intention to quit when exposed to simple messages, whereas individuals with high need for cognition are more likely to be persuaded by complex messages36. In terms of appeal, research on text message development has conceptualized emotional messages based on linguistic cues and paralinguistic cues. Linguistic cues include the use of emotional words (e.g. ‘happy’, ‘angry’)37 and linguistic markers (i.e. expressing emotion without emotional words, e.g. ‘I want to thank you so much’)37-39. Paralinguistic cues in text messages express nonverbal cues that are normally communicated physically. There are five types of paralinguistic cues: vocal spelling (mimicking a specific vocal inflection, e.g. ‘weeeell’, ‘soooo’), lexical surrogates (textual representations of vocal sounds that are not words, e.g. ‘uh-huh’, ‘haha’), spatial arrays (pictographs constructed from punctuation and letters, e.g. :-) for happy face), manipulation of grammatical markers (alterations of the presentation of words, e.g. all capital letters, strings of periods or commas), and minus features (deliberate or inadvertent neglect of conventional formatting elements, e.g. lack of capitalization or paragraphing)40. Previous studies have shown that emotionally evocative messages using fear appeal or humor can be successful for tobacco-risk communication, as they facilitate recall of message content, increase tobacco-related knowledge and motivating users to quit the use of tobacco27,41,42.

Beyond manual analysis of text, communication researchers have worked to develop software programs that can automatically analyze the content of text. Such programs categorize message content based on themes that have been previously identified through extensive traditional manual coding. In particular, we focused on a method, the Linguistic Inquiry and Word Count (LIWC), computerized text-analysis software, that counts the frequency of words and word stems to study the emotional, cognitive, structural and process components in written text or speech43-45. Considering the complexity of message design with respect to depth and appeal, the LIWC procedure can automatically and quantitatively identify text message features that allow the differentiation between message depth and appeal. This study is the first to apply the LIWC procedure in order to validate text messages in the context of tobacco research.

At the Texas Tobacco Center of Regulatory Science (Texas-TCORS), our researchers have developed a library of mobile phone text messages categorized on framing, depth and appeal. The objective of the text messages is to communicate the risks of tobacco use to young adults, both users and nonusers. While the Texas-TCORS library of text messages has been developed through extensive formative research, message categorization has yet to be objectively validated. Predictive validity of the library is crucial as it will ensure that the messages are correctly designed based on their category, and it will allow researchers to conduct randomized trials based on the message categories with confidence. Results from trials can guide potentially more effective communication campaigns. In addition, as we examine message content, we may be able to extract new themes emerging from the messages. Using the LIWC procedure, this paper aims to: 1) validate the library of gain- and loss-framed text messages based on appeal and depth, 2) identify messages that may be improved, and 3) explore additional categories of message design in the Texas-TCORS text message library.

METHODS

Text message development

From January 2014 to August 2015, the Texas-TCORS researchers developed 976 messages, taking into account previous scientific literature, trends in social media related to tobacco product use and trending terminology. Collectively, the research team has extensive experience in tobacco-risk communication, public health, psychology and creative writing. Message development also involved focus groups conducted with young adults, and external experts in health communication, tobacco control and public health. Overall, the focus group discussions indicated that the messages were perceived as interesting and appropriate. Feedback on the messages was incorporated into message revisions, to reach a final version of the messages46.

The library included 976 text messages, developed according to framing, depth and appeal. A permutation of the three structures was implemented to have eight categories of text messages (2x2x2): gain-framed/simple/emotional, loss-framed/simple/emotional, gain-framed/complex/emotional, loss-framed/complex/emotional, gain-framed/simple/rational, loss-framed/simple/rational, gain-framed/complex/rational, and loss-framed/complex/rational (122 messages per category).

Messages describing conventional products included information about combustible cigarettes, cigars, cigarillos, smokeless tobacco and pipes. Messages about new and emerging products included information about e-cigarettes, vapes, electronic liquids and hookah (waterpipes).

Study sample

The library comprised the study sample (N=976 messages). The unit of analysis was the individual text message, defined as what one individual may receive via mobile phone. This included the total content of the post, regardless of length of a given text message (from a single word to multiple paragraphs). Half of the messages were gain-framed and the others were loss-framed. The number of characters per message varied between 52 and 172 characters. Examples of text messages as categorized initially by writers are presented in Table 1.

Table 1

Examples of text messages as designed by writers

Gain-framedGain-framed
SimpleComplexSimpleComplex
EmotionalYummy, pie! Nonsmokers can appreciate every single bite of homemade apple pie since the nicotine in cigs hasn't ~ messed up their taste buds! :PAvoiding cigarettes prevents halitosis. Nonsmokers are not exposed to the disgusting sulfur compounds ~ in tobacco that cause putrid, chronic morning breath! :)Mike had a hot date on Friday but wouldn't stop smoking cigs with his pals. ~ Now he has rotting yellow teeth & an imaginary date. :(Devastating news! Smoking ‘light’ cigarettes will not protect the body from toxicity. ~ All cigarettes rip away approximately a decade from a smoker's lifespan :(
RationalWith 7000+ toxic chemicals in cig smoke, the chances of a nonsmoker getting cancer is really low. Why? ~ They aren't exposed to 60+ cancer-causing chemicals.Were you aware? Avoiding exposure to chemicals in tobacco smoke can prevent premature skin aging. ~ Nonsmokers maintain skin elasticity by avoiding cigarettes.Because smoking cigs raises ppl's risk of a heart attack, using it is a bad health choice.Annually, 16 million Americans have at least one severe disease due to smoking. ~ Smoking any cigarette, even ‘lights’, leads to difficulties maintaining body condition.

Content analysis

The messages were coded using LIWC, which is a valid and reliable method for content analysis of text47. The LIWC, software version 2015, codes for 103 variables using a full dictionary of words48. LIWC calculates ratios or percentages of words that tap on a specific variable (i.e. the number of category words in a single message, divided by the total number of words in that message). These ratios, as opposed to word counts, were used to account for the difference between messages with respect to the amount of content45.

Measures

Using LIWC, we measured message characteristics indicative of depth and appeal, as supported by previous literature3,37,49-54. Message depth was measured through: word length (i.e. frequency of words with 6 letters or more) and word count (i.e. number of words per message)52,53. Message appeal was measured using characteristics that distinguish between emotional and rational messages. To capture emotionality, we measured affect (i.e. frequency of words expressing overall emotion, such as ‘cheerful’, ‘hopeful’, and ‘humor’) and subcategories of affect including negative emotions (e.g. ‘hurt’, ‘mad’, and ‘risk’), positive emotions (e.g. ‘happy’, ‘cheerful’, and ‘thankful’), anger (e.g. ‘rage’, ‘anger’, and ‘aggressive’), anxiety/fear (e.g. ‘anxious’, ‘avoid’, and ‘afraid’), and sadness (‘sad’, ‘alone’, and ‘cry’)37. To capture rationality, we measured two variables: cognitive processing (i.e. frequency of words depicting cognitive processing such as ‘think’, ‘decide’, and ‘perhaps’) and quantification (frequency of words depicting amounts such as numbers, ‘much’, ‘many’, and ‘few’)54.

Finally, we coded for nine new themes using LIWC, based on common themes tackled when communicating tobacco risks3,49-51. In particular, we identified terms related to: health (e.g. ‘diagnosis’, ‘healthy’, and ‘cancer’), death (e.g. ‘death’ and ‘lethal’), social (e.g. ‘parents’, ‘kids’, and ‘friends’), leisure (e.g. ‘bar’ and ‘restaurant’), religion (e.g. ‘God’, ‘pray’, and ‘blessing’), body (e.g. ‘lungs’, ‘skin’, and ‘heart’), work and marketing (e.g. ‘company’ and ‘job’), money (e.g. ‘cost’ and ‘buy’), and sexuality (e.g. ‘pregnant’ and ‘erection’).

Data analysis

We analyzed the text messages separately for depth and appeal. In order to validate the messages based on depth, we first conducted one-way analysis of variance (ANOVA) to determine if message categorization based on depth is related to higher scores on word count and word length. We compared the messages designed by writers to be complex with messages designed to be simple with respect to word count and word length. Then, we conducted multiple logistic regression analysis predicting message design as simple, with word count and word length as the main independent variables, controlling for appeal, framing, affect, cognitive processing and type of nicotine/tobacco product per message.

In order to validate the messages based on appeal, we conducted one-way ANOVA to determine if message categorization by writers is related to higher scores on emotional and rational variables identified by LIWC. We compared the messages designed by writers to be emotional to the messages designed to be rational with respect to affect, stress, anxiety/fear, anger, cognitive processing and quantity. We conducted multiple logistic regression analysis predicting message design as emotional, with affect and cognitive processing as the main independent variables, controlling for depth, framing, word count, word length and type of nicotine/tobacco product per message. With ANOVA, Bonferroni adjustment corrected for alpha over-repeated comparisons and guarded against type 1 error55-57.

To identify messages showing disagreement between writers’ design and objective coding of depth, we conducted a scatter plot of word count versus word length, stratifying between messages designed to be simple and messages designed to be complex. Messages designed to be simple with word length and word count above the medians were identified as messages that disagree with objective coding through LIWC. Also, messages designed to be complex with word length and word count below the medians were identified as messages that disagree with objective coding through LIWC.

Similarly, to identify the messages that indicate disagreement between categorization by writers and objective coding of appeal, we presented a scatter plot of affect versus cognitive processing, stratifying between messages designed to be emotional and messages designed to be rational. Messages designed to be emotional with cognitive processing higher than the median and affect lower than the median were identified as messages that disagree with objective coding by LIWC. Also, messages designed to be rational with cognitive processing lower than the median and affect higher than the median were identified as messages that do not match objective coding.

RESULTS

We conducted descriptive statistics for the variables of interest after the LIWC procedure, for both loss-framed and gain-framed messages (Table 2). Word count, frequency of affect, positive emotions and anxiety/fear were significantly different between gain-framed and loss-framed messages.

Table 2

Descriptive statistics for variables of interest after LIWC procedure (N=976)

VariablesGain-framedLoss-framedTotalFpn2
Message depth Complex
Word counta23.30 (4.15)22.60 (4.33)22.88 (4.43)6.510.0110.010
Word length28.33 (12.04)28.31 (12.13)28.23 (12.16)<0.0010.982<0.001
Message appeal Emotional
Affect12.45 (7.48)8.83 (6.26)10.61 (7.14)66.78<0.0010.064
Positive emotions5.66 (5.47)1.48 (2.69)3.56 (4.79)229.56<0.0010.191
Negative emotions6.73 (5.53)7.28 (5.71)6.98 (5.63)2.380.1230.002
Anxiety/Fear3.89 (3.66)1.87 (2.82)2.87 (3.42)92.77<0.0010.087
Anger0.73 (1.81)0.86 (2.03)0.79 (1.92)1.110.2920.001
Rational
Cognitive processing10.12 (7.01)10.70 (7.53)10.39 (7.29)1.490.220.001
Quantity1.92 (2.91)1.84 (2.77)1.88 (2.84)0.200.65<0.001

a Word count is the only variable that is not a frequency or ratio.

Validity of message categorization based on depth

Message depth validity indicated that the messages designed to be complex had a significantly higher number of words per message (F[1, 974]=72.80, p<0.001, η2=0.07) and frequency of words over six letters (F[1, 974]=562.25, p<0.001, η2=0.37). Messages designed to be complex were more likely to present longer words and higher number of words than messages designed to be simple. Supporting such results, logistic regression analysis indicated that the higher the word count and word length in messages the more likely they are designed as complex, controlling for message framing, message appeal, affect, cognitive processing and type of nicotine/tobacco product mentioned in the message (Table 3).

Table 3

Logistic regression model predicting message categorization by writers based on depth (N = 976)

Messages designed to be simple
OR (SE)a95% CIbp
Word count0.62 (0.02)0.57-0.66<0.001
Word length0.77 (0.01)0.75-0.80<0.001
Gain-framed1.21 (0.26)0.79-1.860.376
Designed to be emotional versus rational1.78 (0.47)1.06-3.000.029
Affect1.04 (0.02)1.00-1.080.026
Cognitive processing1.02 (0.01)0.99-1.050.171
New and emerging product0.92 (0.19)0.61-1.390.695
Model χ2=744.52<0.001

a Indicates odds ratio followed by standard error in parentheses.

b Indicates 95% confidence interval.

Validity of message categorization based on appeal

Messages designed to be emotional were more likely to exhibit words of affect, positive emotions, negative emotions, anxiety and anger, than messages designed to be rational (Table 4). There was no significant difference between rational and emotional messages, with respect to words indicating sadness. Messages designed to be rational were more likely to present words using cognitive processing and quantity than messages designed to be emotional (Table 4).

Table 4

Analysis of variance comparing message categories based on appeal (N=976)

RationalEmotionalFpn2
Emotion-related variables
Affect7.35 (6.27)a13.94 (6.38)265.05<0.0010.21
Positive emotion1.67 (2.96)5.47 (5.47)182.40<0.0010.16
Negative emotion5.61 (5.45)8.40 (5.45)64.01<0.0010.06
Anxiety and fear2.49 (3.05)3.27 (3.72)12.66<0.0010.01
Anger0.43 (1.50)1.15 (2.21)36.13<0.0010.03
Sad0.88 (2.13)1.05 (2.34)1.420.233<0.01
Cognitive-related variables
Cognitive processing11.70 (7.94)9.14 (6.30)31.24<0.0010.03
Quantity2.16 (3.14)1.61 (2.48)9.310.002<0.01

a Values show mean and standard deviation.

Logistic regression analysis indicated that higher frequency of affect and lower frequency of cognitive processing words were related to more emotional messages, controlling for message framing, message depth, word count, word length and type of nicotine/tobacco product mentioned in the message (Table 5). Among all emotion-related variables, affect presented the highest variance (21%) in predicting message categorization by writers. Between the two cognitive-related variables, cognitive processing presented higher variance (3%) in predicting message categorization by writers.

Table 5

Logistic regression analysis predicting messages designed by writers based on appeal (N=976)

Messages designed to be emotional
OR (SE)a95% CIbp
Affect1.23 (0.02)1.19-1.27<0.001
Cognitive processing0.96 (0.01)0.94-0.990.004
Gain-versus loss-framed0.38 (0.07)0.27-0.54<0.001
Designed to be simple versus complex1.77 (0.44)1.08-2.890.023
Word count1.37 (0.01)1.29-1.45<0.001
Word length1.01 (0.01)0.99-1.030.137
New and emerging product1.47 (0.25)1.05-2.050.025
Model χ2=461.49<0.001

a Indicates odds ratio followed by standard error in parentheses.

b Indicates 95% confidence interval

Identification of messages needing improvement

Overall, scatter plot results indicated consistency between categorization by writers and LIWC-coding of depth and appeal features (Fig. 1). Messages scattered above the medians in word length (median=27) and word count (median=23) (Fig. 1a, Quadrant 2) were all designed to be complex (n=258; 52.87% of all complex messages). On the other hand, 2.0% of complex messages (n=10) and 43.03% of simple messages (n=210) were scattered below the medians in word length and word count (Fig. 1a, Quadrant 3). In other words, 95.45% of text messages scattered in this quadrant were designed to be simple. The 10 complex messages scattered below the medians deserve further improvement. However, such messages are very close to the medians (Fig. 1a). One example is ‘Cigarettes may be appealing to some youth, however young people who do not smoke are more likely to avoid impaired cognition’ (Word count=21 words, Word length=23.81% of words over six letters). There was no message designed to be simple that scored above the medians in word count and word length (Quadrant 1). However, Figure 1a indicates 18.65% of simple messages scored above the median in word length alone (Quadrants 1 and 2). One example is ‘Menthol-release cigs [cigarettes] let ppl [people] add menthol to regulars, by pinching a bead. Smoking them makes ppl targets of the tobacco industry’, categorized by writers as loss-framed/simple/rational (Word count=22 words, Word length=40.91% of words over six letters). Also, 19.88% of complex messages (n=97) scored below the median in word length, despite high scores in word count. One example is ‘Oh dear! Lung cancer is always a threat to be concerned about even with “light” cigarettes. All cigarettes prey on the body’s organs and leave them to rot :(’ , categorized by writers as loss-framed/complex/emotional (Word count=29 words, Word length=10.34% of words that over six letters).

Figure 1
https://www.tobaccopreventioncessation.com/f/fulltexts/84866/TPC-4-7-g001_min.jpg

In the context of message appeal (Fig. 1b), several messages agreed with LIWC coding. According to the findings, 41.60% of rational messages (n=203) and 28.69% of emotional messages (n=70) were scattered below the median in affect (median=9.55), and above the median in cognitive processing (median=9.50) (Fig. 1b, Quadrant 4). In other words, 74.36% of text messages scattered in this quadrant are designed by writers as rational. Even though the 70 emotional messages were in quadrant 4, they all presented some level of affect. None of the messages designed to be emotional scored zero on affect (M=13.94, SD=6.38, range score in affect 3.45-36.36).

On the other hand, 41.60% of emotional messages (n=203) and 28.69% of rational messages (n=70) were scattered above the median in affect, and below the median in cognitive processing (Fig. 1b, Quadrant 1). In other words, 74.36% of text messages scattered in this quadrant are designed by writers as emotional. The 70 rational messages in quadrant 1 may need further improvement, considering that 51 of these messages scored zero on cognitive processing. Also, 89% of the 51 messages exhibited words of affect such as ‘help’, ‘avoid’ and ‘risk’. Two messages are ‘Avoiding hookah prevents exposure to high amounts of benzene (a carcinogen), which decreases the risk of developing acute non-lymphocytic leukemia (ANLL)’ and ‘By avoiding cigarettes, young people reduce their risk of developing Buerger’s disease and maintain healthy circulation to extremities’ (Fig. 1b, Quadrant 1).

The scatter plot also indicated that 377 (77.25%) of the messages that were designed to be rational were coded by LIWC with some level of affect. These messages used words such as ‘bad’, ‘dangerous’ and ‘serious’. Examples of messages are ‘MYTH: Menthols aren’t as bad as regular cigs. TRUTH: Smoking either is dangerous’, categorized by writers as loss-framed/simple/rational (frequency of affect words=23.08%, frequency of cognitive processing words=23.08%) and ‘Drinking e-cig juice can cause serious nicotine poisoning. Toddlers who live around e-cig users are at risk of poisoning’, categorized by writers as loss-framed/simple/rational (frequency of affect words=21.05%, frequency of cognitive processing words=5.26%).

New categories of messages in the Texas-TCORS Library

We coded for nine themes for the Texas-TCORS Library, describing common topics for risk communication (Table 5). The most common theme is the description of social situations and social connections, with words such as ‘parents’, ‘kids’ and ‘friends’. This theme is coded separately from leisure, which involves terms such as ‘bars’ and ‘hanging out’. The least common theme is the use of religious terms such as ‘blessing’, ‘pray’ and ‘demon’ (Table 6).

Table 6

Identified themes for nicotine and tobacco-risk communication in messages from the Texas-TCORS Library (N=976)

ThemesN(%)aM(SD)bExamples of Messages
Social761 (77.73)8.47 (7.20)Parents who smoke cigs risk their kid’s health by raising their chances of getting asthma.’
‘Ppl who say yes to their friends’ offers to use snus tobacco may develop a social & physical addiction.’
Health734 (75)6.06 (5.06)‘Ppl who don’t use hookah help keep their lungs safe from infection since hookah pipes can be filled with fungi.’
‘Don’t forget: smoking cigs while pregnant isn’t the only way to raise a baby’s risk of asthma. Another way to raise the baby’s risk is to be around secondhand smoke.’
Body373 (38.22)2.54 (3.72)‘Hooray for healthy brains! Stronger disease-fighting systems protect ppl who don’t smoke cigs from brain-swelling infections. Go brain power! :)’
‘Users have to worry about e-juice spills on the skin which can cause nasty heart problems. Using e-cigs means more risk of being poisoned! :(’
Work and Marketing275 (28.18)1.74 (3.32)Consumers protect their health by ignoring attempts from tobacco companies to target them with concepts of “lighter” &/or “more natural” products.’
‘Tobacco users lower their chances of getting a job. In Texas, there aren’t any laws to protect them from being denied work just because they use tobacco.’
Leisure255 (26.05)1.82 (3.60)‘Social trends-here today, gone tomorrow but their effects can last much longer. Ppl help themselves by avoiding hookah bars & hanging out in smoke-free places.’
‘Wondering why some college kids avoid hookah bars? They know that hookah is bad & are trying not to infect their lungs.’
Money118 (12.05)0.65 (1.88)‘The main cause of lung disease is cig smoke. Each year lung disease causes millions of deaths, high health costs & work-force-losses.’
‘Alternative” cigarettes, like bidis, are marketed as additive-free. Nonsmokers benefit by avoiding all cigarettes including “alternative” products.’
Death34 (3.45)0.16 (0.86)‘Cigar use is estimated to cause approximately 9,000 premature deaths every year. That is almost 140,000 years of potential life lost annually.’
‘Don’t be a loser at the game of life! The nicotine in e-juice refills can be deadly, so drinking it can lead to a fatal overdose. Play it safe! :(’
Sexuality33 (3.38)0.17 (0.96)‘Snus, tobacco in a pouch held inside the mouth, poses greater risk to pregnancies. Pregnant women who use snus increase their risk of a preterm birth.’
‘Unlike nonsmokers, male smokers increase their risk of erectile dysfunction by 30%.
Smoking cigarettes increases the risk of developing this disorder.’
Religion22 (2.25)0.09 (0.63)‘Using electronic cigarettes raises the intention of using conventional cigarettes because users create a demon of addiction that some subdue with conventional cigarettes. :(’
‘Avoiding the scourge of secondhand smoke is a blessing for pregnant women! Mothers who avoid the toxic smoke of cigarettes are able to enjoy a healthier and happier baby. :)’

a Indicates sample size and percentage of text messages identified under the theme.

b Indicates mean frequency of words depicting the theme and standard deviation. Words in italics are identified by LIWC under the theme in question.

DISCUSSION

appropriately associated with depth and appeal This is the first report on the design and validation of a large-scale text message library for tobacco-risk communication. Overall, the results indicate that: 1) the majority of the text messages of the Texas-TCORS library are valid messages to communicate risk and used a wide range of depth and appeal, 2) several identified text messages may benefit from improvement, and 3) the library contains themes beyond framing, depth or appeal.

Based on these findings, tobacco-control advocates who intend to disseminate mobile phone text messages for risk communication can safely apply several messages from the library to young adults to convey information in the intended style, whether emotional, rational, simple or complex. In all, 874 (89.55%) text messages were found to be appropriately associated with depth and appeal factors. Thus, for the overwhelming majority of the text messages, predictive validity indicated that message development based on depth and appeal agrees with LIWC-coding. The higher the frequency of long words and the higher the number of words per message the more likely the message is complex. Also, the higher the frequency of affect words and the lower the frequency of cognitive words the more likely a message is emotional. The results are further supported by visually inspecting scatter plots. The medians for word count and word length corresponded to approximately half complex messages and 40% simple messages. Similarly, the medians for affect and cognitive processing corresponded to approximately 40% emotional messages and 40% rational messages.

However, several messages presented disagreement between LIWC-coding and writers’ design and may benefit from additional modifications in order to be used during tobacco-control campaigns based on their intended style. In the context of message depth, 10 messages designed to be complex indicated simplicity based on word length and word count. Disagreement in message depth was mainly due to the use of long words. About 18% of messages, designed to be simple, scored above the median in word length. The use of shorter words may assist in making such messages simpler. Several messages may need rectification with respect to appeal. In particular, 51 messages designed to be rational exhibited no cognitive processing words. Cognitive processing words (e.g. ‘thus’, ‘if’, ‘because’, ‘perhaps’) can allow the messages to present logic in a chain of thoughts, and as a result strengthen their rational style. These revised messages can then be re-examined within the LIWC coding framework to ensure they are consistent with intended themes. In addition, gain-framed and loss-framed messages differed with respect to emotional appeal and word count. As a result, researchers may need to reexamine framing and balancing the two categories with respect to appeal and complexity.

While the messages were purposely designed based on framing, depth and appeal, new themes were identified during LIWC coding. In addition to themes such as health and death, typical themes relevant for tobacco-risk communication were found, including social connections and leisure, work and marketing, money, sexual connotations, and religion. Ultimately, tobacco-control professionals can select text messages based on the themes that can answer to their populations needs. Further categorization may give mobile phone campaigns a higher level of tailoring, based on college student beliefs, risk perceptions and misconceptions about the effects of tobacco.

Our application of LIWC for content analysis has some assumptions and associated limitations. First, compared with traditional manual coding, LIWC content analysis provides an objective quantification of message depth and appeal. However, this crucially assumes that: 1) the word count and word length are accurate proxies for message complexity, and 2) the frequencies of affect and cognitive processing words are appropriate proxies for message appeal. As a result, some other characteristics of message depth and appeal are not considered, such as reading competency and paralinguistic cues of emotion expression. Nevertheless, the coding features used by LIWC have been supported by previous literature37,52-54, and were able to predict message categories as designed by writers. Second, being an automated procedure, LIWC-coding does involve limitations. While LIWC captures frequency of words for a specific category, it does not capture insinuations made through a group of words in a message. This could be addressed by researchers through manual coding. Third, this study is limited to English-speaking young adults attending community colleges. In the future, it may be beneficial to conduct an intensive work of cross validation with a wider audience, including different cultures of young adults and in the context of other languages.

Several implications can be inferred from this study. First, the findings suggest that tobacco control professionals who aim to engage in risk communication need to develop health messages with careful consideration of content and the application of several design approaches. In addition to social and behavioral formative research (e.g. focus group discussions, in-depth interviews, application of health theories, and literature review), the use of an objective content analysis method, such as LIWC, can benefit message development at several levels. In particular, by being an objective and evidence-based method of text coding, the LIWC procedure can create a foundation of data about the text messages among tobacco-control professionals as they make decisions about message content and features. Also, future work may consider expanding upon the themes identified in the current study. For instance, developing text messages that include social connotations and leisure may be useful in order to tap on the role of social influence and peer-pressure in tobacco use among youths. As a next step, the Texas-TCORS library will be tested as part of a randomized controlled trial to determine the most effective text messages with young adults, in terms of increasing risk perceptions. Findings will be utilized as the basis for other campaigns with this age group and other populations, so that the risks of conventional and emerging tobacco products can be widely disseminated.

CONCLUSIONS

Tobacco-risk researchers can safely use messages from the Texas-TCORS library with young adults to convey information in the intended style. While some messages may benefit from additional modifications, most revealed agreement between LIWC and human categorization. In addition, several new themes are identified from the message library using LIWC, including social connections and leisure, work and marketing, money, sexual connotations, and religion. Future work may expand upon the new themes. Findings will be utilized to develop new campaigns, so that risks of tobacco products can be widely disseminated.