Conversational bots for cognitive and emotional (re-)balancing

Chatbots and natural language generation (NLG) are increasingly seen as promising — actually transformative — ways of interacting with users. Commonly cited use cases range from diverting call center traffic to targeted conveyance of information (e.g., reciting answers to common FAQs). ChatGPT recently has made a huge splash, focusing widespread attention on going beyond automation to understanding, synthesizing, and conversing in (very) sentient ways. Leaving aside the ethical implications, it’s not hard to foresee that chatbots will become ubiquitous in our everyday digital interactions.

Here at Substep, we are interested in chatbots for engaging with users through cognitive and emotional rebalancing. Luckily, there is growing literature on this topic within the mental health and wellness space. Here are a few interesting takeaways:

  1. Conversational agents can improve the mood of individuals who have experienced stress, emotional distress, frustration, and social exclusion/isolation (de Gennaro et al., 2020)
  2. The user needs to establish trust with the chatbot before adoption happens, especially when s/he is communicating sensitive and private information, as is the case in mental health (Tudor Car et al., 2020)
  3. Effective content is highly dependent on what kind of conversation is ongoing. For example, using emojis when the topic is related to physical well-being lowers engagement, while using emojis when the topic is related to mental well-being increases engagement (Rapp et al., 2021)
  4. Robust, large-n evaluation frameworks for assessing clinical outcomes are generally lacking. Evaluations can be related to digital interventions alone or those corresponding to hybrid care modes (Tudor Car et al., 2020). That said, OneMind offers an evaluation framework touching upon chat capabilities in digital mental health

In a comprehensive literature review assessing chatbots in commerce, Lim et al. (2021) identify 17 social science theories that offer guidance on when and why chatbot engagement can be effective. The theories have disciplinary homes ranging from psychology to information technology, marketing, communications, and law. Here, we expand on the original content (Table 5 in the Lim et al. paper) to describe each theory, provide a real-world example, and consider how the theory could inform chatbot development in the wellness space.

Theory Field Description Example Application
Parasocial relationship theory Sociology Parasocial interaction (PSI) describes nonreciprocated audience interactions with media personae. Parasocial theory describes and attempts to explain imagined social relationships and interactions with people who are distant from us and who do not reciprocate individual communication or interest. PSI is the illusion of interaction during viewing and Parasoical relationships(PSRs) are the ongoing perceived connection the audience members experience with the media personae over a longer period of time. Parasocial attachment occurs when the viewer seeks regular proximity to the mediated experience in order to experience an affective bond with a selected media persona. For example, if you feel like you’re one of the gang while watching the characters from Friends spend time together at the Central Perk, you’re experiencing a parasocial interaction. If you continue to think about Rachel, Chandler, Monica, or one of the other members of the group after you’ve finished the episode, maybe even reference their behavior on the show as if they are someone you know, you’ve formed a parasocial relationship with that Friends character. Chatbot might be perceived as a friend, or if there are videos people may relate to the character. A parasocial relationship with the characters might be formed. Social media videos come close to this theory – the person may relate to them and experience a parasocial interaction. 
Expectancy theory of motivation Psychology Vroom’s expectancy theory assumes that behavior results from conscious choices among alternatives whose purpose it is to maximize pleasure and to minimize pain; (1) Expectancy – increased effort=increased performance; (2) Instrumentality – perform well-valued outcome; (3) Valence – importance on the expected outcome Expectancy theory is all about perception. If you believe effort will increase performance, if you believe in the importance of the outcome, and if you believe that better performance will result in a preferred outcome, you — the user, employee, family member, etc. — will engage. If there is a perception that engaging with a chatbot will eventually give rise to some ideal outcome for the user, s/he will engage with the chatbot.
Uncanny valley theory Psychology The uncanny valley is a term used to describe the relationship between the human-like appearance of a robotic object and the emotional response it evokes. In this phenomenon, people feel a sense of unease or even revulsion in response to humanoid robots that are highly realistic. In Shrek, the early test screenings of the film  elicited unexpected feelings of anxiety in children in response to the character Princess Fiona. She was simply too lifelike, causing kids to feel unnerved and even frightened, many crying whenever she appeared onscreen. We may not want a bot to be presented as too realistic!  Users may not be comfortable disclosing and talking to the bot if it feels too life-like.
Stimulus‐organism‐response model Psychology The SOR model describes the connection between stimuli (such as external factors) that will affect organisms (cognition and emotions of people) and the response people have to the stimulus (such as behavior). Stimulus (S) refers to input, which is an external factor related to the environment. Organisms are things that will respond to stimuli which include emotions, feelings, and emotions to these stimuli. Reaction (R) refers to actions and reactions users have to organisms. SOR adds the organism and the emotional component; it’s not just stimulus-response.  Why does one person look at a cockroach or a lizard and jump and shriek and shouts, while another may simply feel amused looking at this person thinking, “What is the worst the cockroach can do?"  Why do some people regularly have an internal conflict when the alarm rings in the morning while others get up immediately? How a person perceives and reacts to a situation depends on how they process (cognitively and emotionally) external stimuli. If the bot understands the user (organism), then it  can react to a particular problem or situation.
Theory of reasoned action (TRA) Psychology TRA suggests that a person’s behavior is determined by their intention to perform the behavior and that this intention is, in turn, a function of their attitude toward the behavior and subjective norms For example, if one believes that recreational drug use (the behavior) is acceptable within one’s social group, s/he will be more likely  to engage in the activity. Providing statistics backing up a preferred behavior can help change a user’s impression of that behavior.  Similarly, offering stories or videos can provide social proof of new norms.
Theory of boundary regulation Psychology Privacy regulation theory was developed by social psychologist Irwin Altman in 1975. This theory aims to explain why people sometimes prefer staying alone but sometimes like get involved in social interactions. Altman also believes the goal of privacy regulation is to achieve the optimum level of privacy (i.e., the ideal level of social interaction). In this optimizing process, we all strive to match the achieved privacy (i.e. the actual level of contact at a specific time) with a desired other person(s). At the optimum level of privacy, we can experience desired solitude when we want to be alone or enjoy the desired social contact when we want to be with people. Although Altman (1995) proposed privacy regulation theory well before the cyber age, recent studies have applied the theory to suggest new ways of thinking about privacy in socio-technical environments.With information technology, privacy extended from physical spaces to virtual spaces. Privacy management is a dynamic mechanism of balance between boundaries as the context changes. The virtual space created new context. Even if a bot is omniscient and knows everything about a user, does it make sense to put forward that information when a user expects less interaction / more privacy?  How can the AI system know a user’s preferred privacy stance given at a given point of time a priori to an interaction? 
Communication accommodation theory Communications Communication Accommodation Theory (CAT) is a general theoretical framework of both interpersonal and intergroup communication. It seeks to explain and predict why, when, and how people adjust their communicative behavior during social interaction, and what social consequences result from those adjustments. For example, people adopt the slang their friends use to fit in. People talk differently using different words and gestures accordingly to the different group of people like old people, children, women, men, teens, rich, poor, powerful, weak, etc. People behave differently with friends and family. What is their preferred behavior towards a bot?  Can the bot adjust on the fly to different individuals? What would that look like?
Technology acceptance model Information technology The Technology Acceptance Model (Davis, 1989), or TAM, posits that there are two factors that determine whether a computer system will be accepted by its potential users: (1) perceived usefulness, and (2) perceived ease of use. The key feature of this model is its emphasis on the perceptions of the potential user. The Technology Acceptance Model predicts that intentions lead to behavior; however, intentions do not always guarantee behavior. For example, someone might intend to use online therapy but not follow through. There are several factors that influence the strength of the relationship between intentions and behavior. Make it useful and make it easy to use and market it as such. Hands-on interactions that ehance perceptions of usefulness and ease of use should increase adoption. 
Big five factors of personality Psychology The five broad personality traits described by the theory are extraversion (also often spelled extroversion), agreeableness, openness, conscientiousness, and neuroticism. It is theorized that certain personality types are more or less open to perceived use and ease of use in creating a behavioral intention to actually adopt a technology. An analytical type — a conscientious person — often prefers a lot of information to make a decision.  Personalizing based on these personality traits is an option. 
Theory of planned behavior Psychology The Theory of Planned Behavior (TPB) started as the Theory of Reasoned Action in 1980 to predict an individual’s intention to engage in a behavior at a specific time and place. The theory was intended to explain all behaviors over which people have the ability to exert self-control. The key component to this model is behavioral intent; behavioral intentions are influenced by the attitude about the likelihood that the behavior will have the expected outcome and the subjective evaluation of the risks and benefits of that outcome.  For example, someone might intend to meditate everyday but not follow through. There are several factors that influence the strength of the relationship between intentions and behavior. Reads similar to TRA. Planning is not equal to behavior. Push notifications could be helpful.
Flow theory Psychology Flow refers to a state of mind which brings together cognitive, physiological and affective aspects. Flow experience corresponds to an optimal psychophysical state: participants said it is like being in the zone, being on the ball, being in the groove. Flow also inspires peak performances so some use expressions such as ‘everything clicks’ and ‘experiencing a magic moment’. For example, a writer experiencing a state of flow may become so immersed in their work that time passes without them even noticing. Deep conversations unlikley to elicit a flow state, especially within a chatbot content. However, recent advancements with large language models and NLG may challenge this assumption.
Humor theory Communications Three theories of humor creation emerge in humor research: (1) Relief theory, which focuses on physiological release of tension; (2) Incongruity theory, singling out violations of a rationally learned pattern; and (3) Superiority theory, involving a sense of victory or triumph. Each theory helps to explain the creation of different aspects of humor, but each runs into problems explaining rhetorical applications of humor Relief Theory of Humor: We laugh when something relieves psychological tension by allowing us to face our fears, release nervous energy, and overcome inhibitions. For example, “The worst is when you ask someone on a date and they turn you down. ‘Cause what they’re really saying is, ‘you know what? I don’t even feel like eating a free meal around you.’” Superiority Theory of Humor: We laugh at the misfortunes and shortcomings of others because it makes us feel better about ourselves. Self-deprecating humor by the stand up comedian
Surprise Theory of Humor (also called Incongruity Theory): We laugh when our perception of a situation suddenly changes. I am a man of my word. And that word is unreliable.”
A small number of wellness bots use self-deprecation to lighten the mood and get around shortcomings in understanding; other theories may require more wit
Theory of mind Psychology Theory of Mind is the branch of cognitive science that investigates how we ascribe mental states to other persons and how we use the states to explain and predict the actions of those other persons. More accurately, it is the branch that investigates mindreading or mentalizing or mentalistic abilities. Without directly know what’s in someone’s mind, we can observe actions and speech and come to a (subjective) conclusion about their intentions, thoughts, and desires of others. A bot can prompt the user to think about other perspectives on an issue.  Additionally, the user may develop an opinion about the intelligence and personality of the bot.
Coolness model Marketing These days, when we float an idea for an interface or demo a prototype, the compliment that we crave for is “This is Cool!” Coolness has become a major design goal for HCI professionals. If we are serious about building Cool into our products, we should also be serious about measuring it. With this in mind, we performed a scientific explication of the concept in order to capture the psychological essence of “coolness,” covering a number of characteristics such as trendiness, uniqueness, rebelliousness, genuineness and utility Brand love achieved through product superiority, design, and marketing (e.g., Apple) A cool bot? That can mean a lot of things, but emotional resonance, cognition, humor, etc. likely all play a factor. 
Social agency theory Sociology From one viewpoint, social agency theory posits that the use of verbal and visual cues, like a more humanlike than overtly artificial voice, in computer-generated messages can encourage learners to consider their interaction with the computer to be similar to what they would expect from a human-to-human conversation.  An alternative way of looking at this is Richard Mayer’s social agency theory of multimedia learning, which proposes that social cues may prime social responses in learners that lead to deeper cognitive processing during learning and hence better test performance Many companies brand their chatbots with an avatar / persona to make them feel more humanlike.  Note this can be at odds with what uncanny valley theory tell us.  Use an avatar, build in memes to mix text with images and video, etc. Conversational chatbots that seem human in their interaction can help users deepen their understanding of new material.
Unified theory of acceptance and use of technology (UTAUT) Information technology The theoretical model of UTAUT suggests that the actual use of technology is determined by behavioural intention. The perceived likelihood of adopting the technology is dependent on the direct effect of four key constructs, namely performance, expectancy, effort expectancy, social influence, and facilitating conditions. UTAUT summarizes many of the theories above, incuding expectancy theory, TRA, and TPB. It explicitly exends TAM.  A chatbot that fully takes into account UTAUT will pull on real-time interaction data, demographic data, and outcomes analysis to guide users in the right direction. 
Contextual integrity theory Law Privacy is defined by the appropriateness of information given the norms of typical flows; ethical concerns may evolve over time with these norms.  Patient medical information is formally governed by HIPAA in the US but also informally by everyday context. A patient would not expect a dermatologist to ask about mental health issues, for example. How much privacy do users expect in given contexts from a bot? 

These theories are interesting for a few different reasons. First, some theories appear to conflict with one another, while others subsume or are very similar to other theories. Thus, we need to take a balanced view when using them generally. Second, the rich disciplinary perspectives offer terrific guidance on how state-of-the-art language models and conversational interfaces could be designed and assessed. Evaluations should reflect the different assumptions and causal mechanisms underlying each theory. Third, if we consider the totality of all the theories, we can appreciate the breadth of variables we need to capture and crunch before a high-powered chatbot can return a salient, comprehensible, privacy-respecting, and norm-abiding response.

Generating meaningful responses into a specific chatbot framework implementation is a tricky proposition — especially if our goal is to improve mental wellness. A primary way to increase chatbot utility is to personalize the experience, i.e., to “harmonize chatbot content with individual treatment recommendations” (Abd-Alrazaq et al., 2021). Because of their non-linear representations of large input spaces when predicting model outputs, deep learning architectures are specifically well-suited for fine-grained personalization. This is true even if the model results don’t neatly map back to social science concepts and variable construction / hypothesis development. At the same time, chatbot responses should align with clinical evidence and best practices and be evaluated accordingly.

But one step at a time, right? Stay tuned for a technical follow-up on getting started with Rasa, an open source chatbot framework, and deploying Rasa using AWS serverless computing.