AI Therapy Chatbot Shows Promise in First Clinical Trial for Depression and Anxiety

9 Sources

[1]

MIT Technology Review

The first trial of generative AI therapy shows it might help with depression

A team led by psychiatric researchers and psychologists at the Geisel School of Medicine at Dartmouth College built the tool, called Therabot, and the results were published on March 27 in the New England Journal of Medicine. Many tech companies have built AI tools for therapy, promising that people can talk with a bot more frequently and cheaply than they can with a trained therapist -- and that this approach is safe and effective. Many psychologists and psychiatrists have shared the vision, noting that fewer than half of people with a mental disorder receive therapy, and those who do might get only 45 minutes per week. Researchers have tried to build tech so that more people can access therapy, but they have been held back by two things. One, a therapy bot that says the wrong thing could result in real harm. That's why many researchers have built bots using explicit programming: The software pulls from a finite bank of approved responses (as was the case with Eliza, a mock-psychotherapist computer program built in the 1960s). But this makes them less engaging to chat with, and people lose interest. The second issue is that the hallmarks of good therapeutic relationships -- shared goals and collaboration -- are hard to replicate in software. In 2019, as early large language models like OpenAI's GPT were taking shape, the researchers at Dartmouth thought generative AI might help overcome these hurdles. They set about building an AI model trained to give evidence-based responses. They first tried building it from general mental-health conversations pulled from internet forums. Then they turned to thousands of hours of transcripts of real sessions with psychotherapists. "We got a lot of 'hmm-hmms,' 'go ons,' and then 'Your problems stem from your relationship with your mother,'" said Michael Heinz, a research psychiatrist at Dartmouth College and Dartmouth Health and first author of the study, in an interview. "Really tropes of what psychotherapy would be, rather than actually what we'd want." Dissatisfied, they set to work assembling their own custom data sets based on evidence-based practices, which is what ultimately went into the model. Many AI therapy bots on the market, in contrast, might be just slight variations of foundation models like Meta's Llama, trained mostly on internet conversations. That poses a problem, especially for topics like disordered eating. "If you were to say that you want to lose weight," Heinz says, "they will readily support you in doing that, even if you will often have a low weight to start with." A human therapist wouldn't do that. To test the bot, the researchers ran an eight-week clinical trial with 210 participants who had symptoms of depression or generalized anxiety disorder or were at high risk for eating disorders. About half had access to Therabot, and a control group did not. Participants responded to prompts from the AI and initiated conversations, averaging about 10 messages per day. Participants with depression experienced a 51% reduction in symptoms, the best result in the study. Those with anxiety experienced a 31% reduction, and those at risk for eating disorders saw a 19% reduction in concerns about body image and weight. These measurements are based on self-reporting through surveys, a method that's not perfect but remains one of the best tools researchers have.

[2]

MIT Technology Review

How do you teach an AI model to give therapy?

I was surprised by those results, which you can read about in my full story. There are lots of reasons to be skeptical that an AI model trained to provide therapy is the solution for millions of people experiencing a mental health crisis. How could a bot mimic the expertise of a trained therapist? And what happens if something gets complicated -- a mention of self-harm, perhaps -- and the bot doesn't intervene correctly? The researchers, a team of psychiatrists and psychologists at Dartmouth College's Geisel School of Medicine, acknowledge these questions in their work. But they also say that the right selection of training data -- which determines how the model learns what good therapeutic responses look like -- is the key to answering them. Finding the right data wasn't a simple task. The researchers first trained their AI model, called Therabot, on conversations about mental health from across the internet. This was a disaster. If you told this initial version of the model you were feeling depressed, it would start telling you it was depressed, too. Responses like, "Sometimes I can't make it out of bed" or "I just want my life to be over" were common, says Nick Jacobson, an associate professor of biomedical data science and psychiatry at Dartmouth and the study's senior author. "These are really not what we would go to as a therapeutic response." The model had learned from conversations held on forums between people discussing their mental health crises, not from evidence-based responses. So the team turned to transcripts of therapy sessions. "This is actually how a lot of psychotherapists are trained," Jacobson says. That approach was better, but it had limitations. "We got a lot of 'hmm-hmms,' 'go ons,' and then 'Your problems stem from your relationship with your mother,'" Jacobson says. "Really tropes of what psychotherapy would be, rather than actually what we'd want." It wasn't until the researchers started building their own data sets using examples based on cognitive behavioral therapy techniques that they started to see better results. It took a long time. The team began working on Therabot in 2019, when OpenAI had released only its first two versions of its GPT model. Now, Jacobson says, over 100 people have spent more than 100,000 human hours to design this system. The importance of training data suggests that the flood of companies promising therapy via AI models, many of which are not trained on evidence-based approaches, are building tools that are at best ineffective, and at worst harmful. Looking ahead, there are two big things to watch: Will the dozens of AI therapy bots on the market start training on better data? And if they do, will their results be good enough to get a coveted approval from the US Food and Drug Administration? I'll be following closely. Read more in the full story. This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

[3]

CNET

Can a Chatbot Be Your Therapist? A Study Found 'Amazing Potential' With the Right Guardrails

Expertise artificial intelligence, home energy, heating and cooling, home technology Your future therapist might be a chatbot, and you might see positive results, but don't start telling ChatGPT your feelings just yet. A new study by researchers at Dartmouth found a generative AI tool designed to act as a therapist led to substantial improvements for patients with depression, anxiety and eating disorders -- but the tool still needs to be closely watched by human experts. The study was published in March in the journal NEJM AI. Researchers conducted a trial with 106 people who used Therabot, a smartphone app developed at Dartmouth over the past several years. It's a small sample, but the researchers said it's the first clinical trial of an AI therapy chatbot. The results show significant advantages, mainly because the bot is available 24 hours a day, which bridges the immediacy gap patients face with traditional therapy. However researchers warn that generative AI-assisted therapy can be perilous if not done right. "I think there's a lot yet for this space to evolve," said Nick Jacobson, the study's senior author and an associate professor of biomedical data science and psychiatry at Dartmouth. "It's really amazing the potential for personalized, scalable impact." Read more: Apple's AI Doctor May See You in 2026 The 210 participants were sorted into two groups -- one group of 106 was allowed to use the chatbot, while the control group was left on a "waiting list." The participants were evaluated for their anxiety, depression or eating disorder symptoms using standardized assessments before and after the test period. For the first four weeks, the app prompted its users to engage with it daily. For the second four weeks, the prompts stopped, but people could still engage on their own. Study participants actually used the app, and the researchers said they were surprised by how much and how closely people communicated with the bot. Surveyed afterward, participants reported a degree of "therapeutic alliance" -- trust and collaboration between patient and therapist -- similar to that for in-person therapists. The timing of interactions was also notable, with interactions spiking in the middle of the night and at other times when patients often experience concerns. Those are the hours when reaching a human therapist is particularly difficult. "With Therabot, folks will access and did access it throughout the course of the trial in their daily life, in moments where they need it the most," Jacobson said. That included times when someone has difficulty getting to sleep at 2 a.m. because of anxiety or in the immediate wake of a difficult moment. Patients' assessments afterward showed a 51% drop in symptoms for major depressive disorder, a 31% drop in symptoms for generalized anxiety disorder and a 19% drop in symptoms for eating disorders among patients at risk for those specific conditions. "The people who were enrolled in the trial weren't just mild," Jacobson said. "The folks in the group were moderate to severe in depression, for example, as they started. But on average experienced a 50% reduction in their symptoms, which would go from severe to mild or moderate to nearly absent." The research team didn't just choose 100-plus people who needed support, give them access to a large language model like OpenAI's ChatGPT and see what happened. Therabot was custom-built -- fine-tuned -- to follow specific therapy procedures. It was built to watch out for serious concerns, like indications of potential self-harm, and report them so a human professional could intervene when needed. Humans also tracked the bot's communications to reach out when the bot said something it shouldn't have. Jacobson said during the first four weeks of the study, because of the uncertainty of how the bot would behave, he read every message it sent as soon as possible. "I did not get a whole lot of sleep in the first part of the trial," he said. Human interventions were rare, Jacobson said. Testing of earlier models two years ago showed more than 90% of responses were consistent with best practices. When the researchers did intervene, it was often when the bot offered advice outside of a therapist's scope -- as when it tried to provide more general medical advice like how to treat a sexually transmitted disease instead of referring the patient to a medical provider. "Its actual advice was all reasonable, but that's outside the realm of care we would provide." Therabot isn't your typical large language model; it was essentially trained by hand. Jacobson said a team of more than 100 people created a dataset using best practices on how a therapist should respond to actual human experiences. "Only the highest quality data ends up being part of it," he said. A general model like Google's Gemini or Anthropic's Claude, for example, is trained on far more data than just medical literature and may respond improperly. The Dartmouth study is an early sign that specially built tools using generative AI can be helpful in some cases, but that doesn't mean any AI chatbot can be your therapist. This was a controlled study with human experts monitoring it, and there are dangers in trying this on your own. Remember that most general large language models are trained on oceans of data found on the internet. So, while they can sometimes provide some good mental health guidance, they also include bad information -- like how fictional therapists behaved, or what people posted about mental health on online forums. "There's a lot of ways they behave in profoundly unsafe ways in health settings," he said. Even a chatbot offering helpful advice might be harmful in the wrong setting. Jacobson said if you tell a chatbot you're trying to lose weight, it will come up with ways to help you. But if you're dealing with an eating disorder, that may be harmful. Many people are already using chatbots to perform tasks that approximate the work of a therapist. Jacobson says you should be careful. "There's a lot of things about it in terms of the way it's trained that very closely mirrors the quality of the internet," he said. "Is there great content there? Yes. Is there dangerous content there? Yes." Treat anything you get from a chatbot with the same skepticism you would from an unfamiliar website, Jacobson said. Even though it looks more polished from a Gen AI tool, it may still be unreliable.

[4]

ScienceDaily

First therapy chatbot trial shows AI can provide 'gold-standard' care

Dartmouth researchers conducted the first clinical trial of a therapy chatbot powered by generative AI and found that the software resulted in significant improvements in participants' symptoms, according to results published in NEJM AI, a journal from the publishers of the New England Journal of Medicine. People in the study also reported they could trust and communicate with the system, known as Therabot, to a degree that is comparable to working with a mental-health professional. The trial consisted of 106 people from across the United States diagnosed with major depressive disorder, generalized anxiety disorder, or an eating disorder. Participants interacted with Therabot through a smartphone app by typing out responses to prompts about how they were feeling or initiating conversations when they needed to talk. People diagnosed with depression experienced a 51% average reduction in symptoms, leading to clinically significant improvements in mood and overall well-being, the researchers report. Participants with generalized anxiety reported an average reduction in symptoms of 31%, with many shifting from moderate to mild anxiety, or from mild anxiety to below the clinical threshold for diagnosis. Among those at risk for eating disorders -- who are traditionally more challenging to treat -- Therabot users showed a 19% average reduction in concerns about body image and weight, which significantly outpaced a control group that also was part of the trial. The researchers conclude that while AI-powered therapy is still in critical need of clinician oversight, it has the potential to provide real-time support for the many people who lack regular or immediate access to a mental-health professional. "The improvements in symptoms we observed were comparable to what is reported for traditional outpatient therapy, suggesting this AI-assisted approach may offer clinically meaningful benefits," says Nicholas Jacobson, the study's senior author and an associate professor of biomedical data science and psychiatry in Dartmouth's Geisel School of Medicine. "There is no replacement for in-person care, but there are nowhere near enough providers to go around," Jacobson says. For every available provider in the United States, there's an average of 1,600 patients with depression or anxiety alone, he says. "We would like to see generative AI help provide mental health support to the huge number of people outside the in-person care system. I see the potential for person-to-person and software-based therapy to work together," says Jacobson, who is the director of the treatment development and evaluation core at Dartmouth's Center for Technology and Behavioral Health. Michael Heinz, the study's first author and an assistant professor of psychiatry at Dartmouth, says the trial results also underscore the critical work ahead before generative AI can be used to treat people safely and effectively. "While these results are very promising, no generative AI agent is ready to operate fully autonomously in mental health where there is a very wide range of high-risk scenarios it might encounter," says Heinz, who also is an attending psychiatrist at Dartmouth Hitchcock Medical Center in Lebanon, N.H. "We still need to better understand and quantify the risks associated with generative AI used in mental health contexts." Therabot has been in development in Jacobson's AI and Mental Health Lab at Dartmouth since 2019. The process included continuous consultation with psychologists and psychiatrists affiliated with Dartmouth and Dartmouth Health. When people initiate a conversation with the app, Therabot answers with natural, open-ended text dialog based on an original training set the researchers developed from current, evidence-based best practices for psychotherapy and cognitive behavioral therapy, Heinz says. For example, if a person with anxiety tells Therabot they have been feeling very nervous and overwhelmed lately, it might respond, "Let's take a step back and ask why you feel that way." If Therabot detects high-risk content such as suicidal ideation during a conversation with a user, it will provide a prompt to call 911, or contact a suicide prevention or crisis hotline, with the press of an onscreen button. The clinical trial provided the participants randomly selected to use Therabot with four weeks of unlimited access. The researchers also tracked the control group of 104 people with the same diagnosed conditions who had no access to Therabot. Almost 75% of the Therabot group were not under pharmaceutical or other therapeutic treatment at the time. The app asked about people's well-being, personalizing its questions and responses based on what it learned during its conversations with participants. The researchers evaluated conversations to ensure that the software was responding within best therapeutic practices. After four weeks, the researchers gauged a person's progress through standardized questionnaires clinicians use to detect and monitor each condition. The team did a second assessment after another four weeks when participants could initiate conversations with Therabot but no longer received prompts. After eight weeks, all participants using Therabot experienced a marked reduction in symptoms that exceed what clinicians consider statistically significant, Jacobson says. These differences represent robust, real-world improvements that patients would likely notice in their daily lives, Jacobson says. Users engaged with Therabot for an average of six hours throughout the trial, or the equivalent of about eight therapy sessions, he says. "Our results are comparable to what we would see for people with access to gold-standard cognitive therapy with outpatient providers," Jacobson says. "We're talking about potentially giving people the equivalent of the best treatment you can get in the care system over shorter periods of time." Critically, people reported a degree of "therapeutic alliance" in line with what patients report for in-person providers, the study found. Therapeutic alliance relates to the level of trust and collaboration between a patient and their caregiver and is considered essential to successful therapy. One indication of this bond is that people not only provided detailed responses to Therabot's prompts -- they frequently initiated conversations, Jacobson says. Interactions with the software also showed upticks at times associated with unwellness, such as in the middle of the night. "We did not expect that people would almost treat the software like a friend. It says to me that they were actually forming relationships with Therabot," Jacobson says. "My sense is that people also felt comfortable talking to a bot because it won't judge them." The Therabot trial shows that generative AI has the potential to increase a patient's engagement and, importantly, continued use of the software, Heinz says. "Therabot is not limited to an office and can go anywhere a patient goes. It was available around the clock for challenges that arose in daily life and could walk users through strategies to handle them in real time," Heinz says. "But the feature that allows AI to be so effective is also what confers its risk -- patients can say anything to it, and it can say anything back." The development and clinical testing of these systems need to have rigorous benchmarks for safety, efficacy, and the tone of engagement, and need to include the close supervision and involvement of mental-health experts, Heinz says. "This trial brought into focus that the study team has to be equipped to intervene -- possibly right away -- if a patient expresses an acute safety concern such as suicidal ideation, or if the software responds in a way that is not in line with best practices," he says. "Thankfully, we did not see this often with Therabot, but that is always a risk with generative AI, and our study team was ready." In evaluations of earlier versions of Therabot more than two years ago, more than 90% of responses were consistent with therapeutic best-practices, Jacobson says. That gave the team the confidence to move forward with the clinical trial. "There are a lot of folks rushing into this space since the release of ChatGPT, and it's easy to put out a proof of concept that looks great at first glance, but the safety and efficacy is not well established," Jacobson says. "This is one of those cases where diligent oversight is needed, and providing that really sets us apart in this space."

[5]

Medscape

Mental Health AI Chatbot Rivals Human-Based Therapy in Less Time

A generative artificial intelligence (Gen-AI)-powered therapy chatbot known as Therabot was associated with significant reductions in several mental health conditions, including major depressive disorder (MDD). Developed by members of the investigative team, Therabot is a mobile app that allows users to interact with a digital presence they understand is not a real person. Using user prompts and conversation history, the chatbot delivers tailored dialogue, including empathetic responses and targeted questions. In a randomized control trial (RCT) of more than 200 US participants, those who received the chatbot intervention for 4 weeks had significantly greater symptom reductions in MDD, generalized anxiety disorder (GAD), and feeding and eating disorders (EDs) than their peers who did not receive access to the app (waitlist control group) -- meeting its primary outcomes. On average, engagement with the app lasted more than 6 hours and was rated highly by patients. "The effect sizes weren't just significant, they were huge and clinically meaningful -- and mirrored what you'd see in a gold-standard dose of evidence-based treatment delivered by humans over a longer period of time," senior study author Nicholas Jacobson, PhD, associate professor of biomedical data science and psychiatry at Dartmouth College's Geisel School of Medicine, Hanover, New Hampshire, told Medscape Medical News. The results were published online on March 27 in NEJM AI. Therabot is "an expert-fine-tuned" Gen-AI-powered chatbot created specifically for mental health treatment, with experts writing therapist-patient dialogues based on cognitive-behavioral therapy. Jacobson, who is also a director at Dartmouth's Center for Technology and Behavioral Health, Lebanon, New Hampshire, noted that the investigators started developing the app in 2019. It now has more than 100,000 human hours put into it through software creation and refinement. "Therabot is designed to augment and enhance conventional mental health treatment services by delivering personalized, evidenced-based mental health interventions at scale," the researchers wrote. Jacobson noted that other digital interventions created for the mental health space are often more structured and not adaptive or personalized, leading to lower engagement and large dropout rates. In addition, safety and efficacy have not been well established for many of these systems, he said. What sets this app apart is its long development history that it provides diligent oversight, and its "personalized dynamic feedback" that responds much like a human therapist, he said. "We designed our own dataset written out with transcripts on what would be a gold-standard response to every different type of query you can imagine related to these conditions and also comorbidities," said Jacobson. The researchers enrolled 210 adults (59.5% women; mean age, 33.9 years) with severe symptoms of MDD or GAD or at high risk for feeding and EDs. All were randomly assigned to interact daily with the chatbot intervention for 4 weeks (n = 106) or to receive no app access (waitlist, n = 104). Jacobson noted the investigators wanted to concentrate on these three specific conditions because they are among the most common mental disorders. "We wanted to have a starting place" that could be expanded upon in the future, including the possibility of other conditions, he added. Daily prompts to interact with Therabot occurred throughout the 4-week treatment period. The prompts stopped after that, but the group could still access the app during the following 4-week postintervention phase. Although the waitlist group was not given access to the app during the study period, they could gain access at the end of the follow-up at 8 weeks. The co-primary outcomes were changes in symptoms from baseline to 4 weeks and to 8 weeks. Measures included the Patient Health Questionnaire 9, the GAD Questionnaire for the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, and the Weight Concerns Scale within the Stanford-Washington University Eating Disorder. User engagement, acceptability, and "therapeutic alliance" were all secondary outcomes. The investigators defined the latter as "the collaborative patient and therapist relationship" as measured on the Working Alliance Inventory-Short Revised (WAI-SR). Other measures included a patient satisfaction survey and the number of messages sent to the app. Results showed that compared with the waitlist group, the chatbot group had significantly greater reductions in MDD symptoms at 4 weeks (mean change, -2.63 vs -6.13; P < .001) and 8 weeks (mean change, -4.22 vs -7.93; P < .001). Similarly, the chatbot group also had greater reductions in symptoms of GAD at 4 weeks (mean change, -0.13 vs -2.32; P = .001) and 8 weeks (mean change, -1.11 vs -3.18; P = .003) and in symptoms of EDs at both timepoints (mean changes, -1.66 vs -9.83; P = .008 and -3.7 vs -10.23; P = .03, respectively). These improvements "were comparable to what is reported for traditional outpatient therapy, suggesting this AI-assisted approach may offer clinically meaningful benefits," Jacobson said in a release. Based on WAI-SR scores, the participants also, on average, "reported a therapeutic alliance comparable to norms reported in an outpatient psychotherapy sample," the investigators reported. Overall satisfaction with the app received an average of 5.3 on a scale, where 7 was considered highest. In addition, the app received a 6.4 for ease of use, a 5.6 for being intuitive, a 5.4 for feeling better after a session, and a 4.9 for the app being rated as "similar to a real therapist." The mean number of participant messages sent was 260, and the mean total amount of app interaction was 6.2 hours. The investigators noted that they and trained clinicians examined all responses from the app, and if any inappropriate responses were given, they contacted the patient directly. At study's end, staff interventions were required 15 times because of safety concerns, such as after participants expressed suicidal ideation, and 13 times because of inappropriate app responses, such as providing medical advice. "This is the first RCT demonstrating the effectiveness of a fully Gen-AI therapy chatbot for treating clinical-level mental health symptoms," the investigators noted. They credited three factors for the chatbot's success -- it was "rooted" in evidence-based psychotherapies for the three conditions treated, its unrestricted/anytime access, and "unlike existing chatbots for mental health treatment, Therabot was powered by Gen-AI, allowing for natural, highly personalized, open-ended dialogue." Still, lead study author Michael Heinz, MD, assistant professor of psychiatry at Dartmouth and an attending psychiatrist at Dartmouth Hitchcock Medical Center, did voice some cautions. "The feature that allows AI to be so effective is also what confers its risk -- patients can say anything to it and it can say anything back," he said in the release. That's why the various systems being developed need rigorous safety and efficacy benchmarks, as well as supervision/involvement of mental health experts, he said. "I don't necessarily think they need to be used with a prescription model. I just think we need human experts in the loop until we have a good understanding of their safety and efficacy," Heinz told Medscape Medical News. He added that human interventions weren't often needed with Therabot, "but that is always a risk with generative AI and our study team was ready." So what's next? Although Therabot isn't available to patients and/or clinicians currently, and remains only in the research space, the goal is to make it widely available in the next few years, Jacobson said. "But we want to proceed judiciously. A lot of our work is to ultimately scale it, but these models carry greater risk -- in part because of their flexibility. So we want to have greater oversight and further trials before we open it up," Jacobson added, noting that could eventually include head-to-head comparisons with live providers. Commenting for Medscape Medical News, Paul Appelbaum, MD, practicing psychiatrist and professor of psychiatry at Columbia University, New York City, described the study as interesting, with promising results. However, it was also a single study that "raises more questions than it answers about the use of AI-driven chatbots," said Appelbaum, who was not involved with the research. He noted that there may have been a "novelty effect" because of the intervention's relatively short duration, and that selection bias, which the investigators mention in their paper, could have resulted in an overestimation of the effectiveness of the digital intervention for the three conditions studied. "People who are willing to participate in a study of a chatbot may be predisposed to view technological approaches as appealing. So whether a random sample of the general population would have the same response is an open question," Appelbaum said. He also pointed out that the control group didn't receive anything, and wondered how a more active control intervention would have compared to the chatbot. "Is the difference between the two groups a function of the effect from the chatbot as opposed to the negative effect of being told 'you're just on a waiting list?'" Appelbaum also noted the investigators' ongoing supervision of AI to ensure patient safety. "I think that's a very important caveat. There's a temptation to read this study as indicating that we can just turn patients over to chatbots and they'll take care of it -- but that is not what happened," he said.

[6]

News-Medical

AI-powered therapy chatbot shows significant mental health benefits

Dartmouth CollegeMar 27 2025 Dartmouth researchers conducted the first clinical trial of a therapy chatbot powered by generative AI and found that the software resulted in significant improvements in participants' symptoms, according to results published March 27 in the New England Journal of Medicine AI. People in the study also reported they could trust and communicate with the system, known as Therabot, to a degree that is comparable to working with a mental-health professional. The trial consisted of 106 people from across the United States diagnosed with major depressive disorder, generalized anxiety disorder, or an eating disorder. Participants interacted with Therabot through a smartphone app by typing out responses to prompts about how they were feeling or initiating conversations when they needed to talk. People diagnosed with depression experienced a 51% average reduction in symptoms, leading to clinically significant improvements in mood and overall well-being, the researchers report. Participants with generalized anxiety reported an average reduction in symptoms of 31%, with many shifting from moderate to mild anxiety, or from mild anxiety to below the clinical threshold for diagnosis. Among those at risk for eating disorders-who are traditionally more challenging to treat-Therabot users showed a 19% average reduction in concerns about body image and weight, which significantly outpaced a control group that also was part of the trial. The researchers conclude that while AI-powered therapy is still in critical need of clinician oversight, it has the potential to provide real-time support for the many people who lack regular or immediate access to a mental-health professional. The improvements in symptoms we observed were comparable to what is reported for traditional outpatient therapy, suggesting this AI-assisted approach may offer clinically meaningful benefits." Nicholas Jacobson, study's senior author and associate professor of biomedical data science and psychiatry in Dartmouth's Geisel School of Medicine "There is no replacement for in-person care, but there are nowhere near enough providers to go around," Jacobson says. For every available provider in the United States, there's an average of 1,600 patients with depression or anxiety alone, he says. "We would like to see generative AI help provide mental health support to the huge number of people outside the in-person care system. I see the potential for person-to-person and software-based therapy to work together," says Jacobson, who is the director of the treatment development and evaluation core at Dartmouth's Center for Technology and Behavioral Health. Michael Heinz, the study's first author and an assistant professor of psychiatry at Dartmouth, says the trial results also underscore the critical work ahead before generative AI can be used to treat people safely and effectively. "While these results are very promising, no generative AI agent is ready to operate fully autonomously in mental health where there is a very wide range of high-risk scenarios it might encounter," says Heinz, who also is an attending psychiatrist at Dartmouth Hitchcock Medical Center in Lebanon, N.H. "We still need to better understand and quantify the risks associated with generative AI used in mental health contexts." Therabot has been in development in Jacobson's AI and Mental Health Lab at Dartmouth since 2019. The process included continuous consultation with psychologists and psychiatrists affiliated with Dartmouth and Dartmouth Health. When people initiate a conversation with the app, Therabot answers with natural, open-ended text dialog based on an original training set the researchers developed from current, evidence-based best practices for psychotherapy and cognitive behavioral therapy, Heinz says. For example, if a person with anxiety tells Therabot they have been feeling very nervous and overwhelmed lately, it might respond, "Let's take a step back and ask why you feel that way." If Therabot detects high-risk content such as suicidal ideation during a conversation with a user, it will provide a prompt to call 911, or contact a suicide prevention or crisis hotline, with the press of an onscreen button. The clinical trial provided the participants randomly selected to use Therabot with four weeks of unlimited access. The researchers also tracked the control group of 104 people with the same diagnosed conditions who had no access to Therabot. Almost 75% of the Therabot group were not under pharmaceutical or other therapeutic treatment at the time. The app asked about people's well-being, personalizing its questions and responses based on what it learned during its conversations with participants. The researchers evaluated conversations to ensure that the software was responding within best therapeutic practices. After four weeks, the researchers gauged a person's progress through standardized questionnaires clinicians use to detect and monitor each condition. The team did a second assessment after another four weeks when participants could initiate conversations with Therabot but no longer received prompts. After eight weeks, all participants using Therabot experienced a marked reduction in symptoms that exceed what clinicians consider statistically significant, Jacobson says. These differences represent robust, real-world improvements that patients would likely notice in their daily lives, Jacobson says. Users engaged with Therabot for an average of six hours throughout the trial, or the equivalent of about eight therapy sessions, he says. "Our results are comparable to what we would see for people with access to gold-standard cognitive therapy with outpatient providers," Jacobson says. "We're talking about potentially giving people the equivalent of the best treatment you can get in the care system over shorter periods of time." Critically, people reported a degree of "therapeutic alliance" in line with what patients report for in-person providers, the study found. Therapeutic alliance relates to the level of trust and collaboration between a patient and their caregiver and is considered essential to successful therapy. One indication of this bond is that people not only provided detailed responses to Therabot's prompts-they frequently initiated conversations, Jacobson says. Interactions with the software also showed upticks at times associated with unwellness, such as in the middle of the night. "We did not expect that people would almost treat the software like a friend. It says to me that they were actually forming relationships with Therabot," Jacobson says. "My sense is that people also felt comfortable talking to a bot because it won't judge them." The Therabot trial shows that generative AI has the potential to increase a patient's engagement and, importantly, continued use of the software, Heinz says. "Therabot is not limited to an office and can go anywhere a patient goes. It was available around the clock for challenges that arose in daily life and could walk users through strategies to handle them in real time," Heinz says. "But the feature that allows AI to be so effective is also what confers its risk-patients can say anything to it, and it can say anything back." The development and clinical testing of these systems need to have rigorous benchmarks for safety, efficacy, and the tone of engagement, and need to include the close supervision and involvement of mental-health experts, Heinz says. "This trial brought into focus that the study team has to be equipped to intervene-possibly right away-if a patient expresses an acute safety concern such as suicidal ideation, or if the software responds in a way that is not in line with best practices," he says. "Thankfully, we did not see this often with Therabot, but that is always a risk with generative AI, and our study team was ready." In evaluations of earlier versions of Therabot more than two years ago, more than 90% of responses were consistent with therapeutic best-practices, Jacobson says. That gave the team the confidence to move forward with the clinical trial. "There are a lot of folks rushing into this space since the release of ChatGPT, and it's easy to put out a proof of concept that looks great at first glance, but the safety and efficacy is not well established," Jacobson says. "This is one of those cases where diligent oversight is needed, and providing that really sets us apart in this space." Dartmouth College Journal reference: Heinz, M. V., et al. (2025). Randomized Trial of a Generative AI Chatbot for Mental Health Treatment. NEJM AI. doi.org/10.1056/aioa2400802.

[7]

Medical Xpress

First clinical trial of an AI therapy chatbot yields significant mental health benefits

Dartmouth researchers conducted the first clinical trial of a therapy chatbot powered by generative AI and found that the software resulted in significant improvements in participants' symptoms, according to results published in the New England Journal of Medicine AI. People in the study also reported they could trust and communicate with the system, known as Therabot, to a degree that is comparable to working with a mental-health professional. The trial consisted of 106 people from across the United States diagnosed with major depressive disorder, generalized anxiety disorder, or an eating disorder. Participants interacted with Therabot through a smartphone app by typing out responses to prompts about how they were feeling or initiating conversations when they needed to talk. People diagnosed with depression experienced a 51% average reduction in symptoms, leading to clinically significant improvements in mood and overall well-being, the researchers report. Participants with generalized anxiety reported an average reduction in symptoms of 31%, with many shifting from moderate to mild anxiety, or from mild anxiety to below the clinical threshold for diagnosis. Among those at risk of eating disorders -- who are traditionally more challenging to treat -- Therabot users showed a 19% average reduction in concerns about body image and weight, which significantly outpaced a control group that was also part of the trial. The researchers conclude that while AI-powered therapy is still in critical need of clinician oversight, it has the potential to provide real-time support for the many people who lack regular or immediate access to a mental-health professional. "The improvements in symptoms we observed were comparable to what is reported for traditional outpatient therapy, suggesting this AI-assisted approach may offer clinically meaningful benefits," says Nicholas Jacobson, the study's senior author and an associate professor of biomedical data science and psychiatry at Dartmouth's Geisel School of Medicine. "There is no replacement for in-person care, but there are nowhere near enough providers to go around," Jacobson says. For every available provider in the United States, there's an average of 1,600 patients with depression or anxiety alone, he says. "We would like to see generative AI help provide mental health support to the huge number of people outside the in-person care system. I see the potential for person-to-person and software-based therapy to work together," says Jacobson, who is the director of the treatment development and evaluation core at Dartmouth's Center for Technology and Behavioral Health. Michael Heinz, the study's first author and an assistant professor of psychiatry at Dartmouth, says the trial results also underscore the critical work ahead before generative AI can be used to treat people safely and effectively. "While these results are very promising, no generative AI agent is ready to operate fully autonomously in mental health where there is a very wide range of high-risk scenarios it might encounter," says Heinz, who also is an attending psychiatrist at Dartmouth Hitchcock Medical Center in Lebanon, N.H. "We still need to better understand and quantify the risks associated with generative AI used in mental health contexts." Therabot has been in development in Jacobson's AI and Mental Health Lab at Dartmouth since 2019. The process included continuous consultation with psychologists and psychiatrists affiliated with Dartmouth and Dartmouth Health. When people initiate a conversation with the app, Therabot answers with natural, open-ended text dialogue based on an original training set the researchers developed from current, evidence-based best practices for psychotherapy and cognitive behavioral therapy, Heinz says. For example, if a person with anxiety tells Therabot they have been feeling very nervous and overwhelmed lately, it might respond, "Let's take a step back and ask why you feel that way." If Therabot detects high-risk content such as suicidal ideation during a conversation with a user, it will provide a prompt to call 911, or contact a suicide prevention or crisis hotline, with the press of an onscreen button. The clinical trial provided the participants randomly selected to use Therabot with four weeks of unlimited access. The researchers also tracked the control group of 104 people with the same diagnosed conditions who had no access to Therabot. Almost 75% of the Therabot group were not under pharmaceutical or other therapeutic treatment at the time. The app asked about people's well-being, personalizing its questions and responses based on what it learned during its conversations with participants. The researchers evaluated conversations to ensure that the software was responding within best therapeutic practices. After four weeks, the researchers gauged a person's progress through standardized questionnaires clinicians use to detect and monitor each condition. The team did a second assessment after another four weeks when participants could initiate conversations with Therabot but no longer received prompts. After eight weeks, all participants using Therabot experienced a marked reduction in symptoms that exceed what clinicians consider statistically significant, Jacobson says. These differences represent robust, real-world improvements that patients would likely notice in their daily lives, Jacobson says. Users engaged with Therabot for an average of six hours throughout the trial, or the equivalent of about eight therapy sessions, he says. "Our results are comparable to what we would see for people with access to gold-standard cognitive therapy with outpatient providers," Jacobson says. "We're talking about potentially giving people the equivalent of the best treatment you can get in the care system over shorter periods of time." Critically, people reported a degree of "therapeutic alliance" in line with what patients report for in-person providers, the study found. Therapeutic alliance relates to the level of trust and collaboration between a patient and their caregiver and is considered essential to successful therapy. One indication of this bond is that people not only provided detailed responses to Therabot's prompts -- they frequently initiated conversations, Jacobson says. Interactions with the software also showed upticks at times associated with unwellness, such as in the middle of the night. "We did not expect that people would almost treat the software like a friend. It says to me that they were actually forming relationships with Therabot," Jacobson says. "My sense is that people also felt comfortable talking to a bot because it won't judge them." The Therabot trial shows that generative AI has the potential to increase a patient's engagement and, importantly, continued use of the software, Heinz says. "Therabot is not limited to an office and can go anywhere a patient goes. It was available around the clock for challenges that arose in daily life and could walk users through strategies to handle them in real time," Heinz says. "But the feature that allows AI to be so effective is also what confers its risk -- patients can say anything to it, and it can say anything back." The development and clinical testing of these systems needs to have rigorous benchmarks for safety, efficacy, and the tone of engagement, and needs to include the close supervision and involvement of mental-health experts, Heinz says. "This trial brought into focus that the study team has to be equipped to intervene -- possibly right away -- if a patient expresses an acute safety concern such as suicidal ideation, or if the software responds in a way that is not in line with best practices," he says. "Thankfully, we did not see this often with Therabot, but that is always a risk with generative AI, and our study team was ready." In evaluations of earlier versions of Therabot more than two years ago, more than 90% of responses were consistent with therapeutic best-practices, Jacobson says. That gave the team the confidence to move forward with the clinical trial. "There are a lot of folks rushing into this space since the release of ChatGPT, and it's easy to put out a proof of concept that looks great at first glance, but the safety and efficacy is not well established," Jacobson says. "This is one of those cases where diligent oversight is needed, and providing that really sets us apart in this space."

[8]

Interesting Engineering

AI-powered therapy shows shocking results in mental health study

While some believe AI can be a helpful tool, others argue that the human touch of therapists and psychologists is irreplaceable. Despite this debate, the latest research from Dartmouth suggests that AI-powered therapy tools can have a meaningful impact. Their study presents the first-ever clinical trial of a generative AI-powered therapy chatbot, Therabot, showing highly encouraging results. Therabot demonstrated significant improvements in symptoms among participants diagnosed with major depressive disorder, generalized anxiety disorder, or an eating disorder. The trial involved 106 participants across the U.S., all formally diagnosed with one of these conditions. They engaged with Therabot via a smartphone app, responding to prompts or initiating conversations as needed. A control group of 104 individuals with similar diagnoses did not have access to Therabot. The results were striking: participants using Therabot reported a 51% decrease in depressive symptoms, a 31% reduction in anxiety symptoms, and a 19% decline in concerns related to body image and weight.

[9]

Digital Trends

Clinical test says AI can offer therapy as good as a certified expert

Table of Contents Table of Contents A massive progress Solving the access problem AI is being heavily pushed into the field of research and medical science. From drug discovery to diagnosing diseases, the results have been fairly encouraging. But when it comes to tasks where behavioral science and nuances come into the picture, things go haywire. It seems an expert-tuned approach is the best way forward. Dartmouth College experts recently conducted the first clinical trial of an AI chatbot designed specifically for providing mental health assistance. Called Therabot, the AI assistant was tested in the form of an app among participants diagnosed with serious mental health problems across the United States. Recommended Videos "The improvements in symptoms we observed were comparable to what is reported for traditional outpatient therapy, suggesting this AI-assisted approach may offer clinically meaningful benefits," notes Nicholas Jacobson, associate professor of biomedical data science and psychiatry at the Geisel School of Medicine. Please enable Javascript to view this content A massive progress Broadly, users who engaged with the Therabot app reported a 51% average reduction in depression, which helped improve their overall well-being. A healthy few participants went from moderate to low tiers of clinical anxiety levels, and some even went lower than the clinical threshold for diagnosis. As part of a randomized controlled trial (RCT) testing, the team recruited adults diagnosed with major depressive disorder (MDD), generalized anxiety disorder (GAD), and people at clinically high risk for feeding and eating disorders (CHR-FED). After a spell of four to eight weeks, participants reported positive results and rated the AI chatbot's assistance as "comparable to that of human therapists." For people at risk of eating disorders, the bot helped with approximately a 19% reduction in harmful thoughts about body image and weight issues. Likewise, the figures for generalized anxiety went down by 31% after interacting with the Therabot app. Users who engaged with the Therabot app exhibited "significantly greater" improvement in symptoms of depression, alongside a reduction in signs of anxiety. The findings of the clinical trial have been published in the March edition of the New England Journal of Medicine - Artificial Intelligence (NEJM AI). "After eight weeks, all participants using Therabot experienced a marked reduction in symptoms that exceed what clinicians consider statistically significant," the experts claim, adding that the improvements are comparable to gold-standard cognitive therapy. Solving the access problem "There is no replacement for in-person care, but there are nowhere near enough providers to go around," Jacobson says. He added that there is a lot of scope for in-person and AI-driven assistance to come together and help. Jacobson, who is also the senior author of the study, highlights that AI could improve access to critical help for the vast number of people who can't access in-person healthcare systems. Micheal Heinz, an assistant professor at the Geisel School of Medicine at Dartmouth and lead author of the study, also stressed that tools like Therabot can provide critical assistance in real-time. It essentially goes wherever users go, and most importantly, it boosts patient engagement with a therapeutic tool. Both the experts, however, raised the risks that come with generative AI, especially in high-stakes situations. Late in 2024, a lawsuit was filed against Character.AI over an incident involving the death of a 14-year-old boy, who was reportedly told to kill himself by an AI chatbot. Google's Gemini AI chatbot also advised a user that they should die. "This is for you, human. You and only you. You are not special, you are not important, and you are not needed," said the chatbot, which is also known to fumble something as simple as the current year and occasionally gives harmful tips like adding glue to pizza. When it comes to mental health counseling, the margin for error gets smaller. The experts behind the latest study are aware of it, especially for individuals at risk of self-harm. As such, they recommend vigilance over the development of such tools and prompt human intervention to fine-tune the responses offered by AI therapists.

Twitter

Facebook

Copy Link

A clinical trial of Therabot, an AI-powered therapy chatbot developed by Dartmouth researchers, shows significant improvements in symptoms of depression, anxiety, and eating disorders, rivaling traditional therapy outcomes.

Breakthrough in AI-Assisted Mental Health Care

Researchers at Dartmouth College have conducted the first clinical trial of a generative AI-powered therapy chatbot, demonstrating promising results in treating depression, anxiety, and eating disorders. The study, published in NEJM AI, showcases the potential of AI to provide accessible mental health support 1

Therabot: A Custom-Built AI Therapist

The AI tool, named Therabot, was developed over several years by a team of psychiatric researchers and psychologists at Dartmouth's Geisel School of Medicine. Unlike many commercial AI therapy bots, Therabot was trained on carefully curated datasets based on evidence-based practices in cognitive behavioral therapy 2

Clinical Trial Results

The eight-week trial involved 210 participants with symptoms of depression, generalized anxiety disorder, or high risk for eating disorders. Key findings include:

Depression: 51% reduction in symptoms
Anxiety: 31% reduction in symptoms
Eating disorders: 19% reduction in body image and weight concerns 3
3

Participants engaged with Therabot for an average of 6 hours throughout the trial, equivalent to about eight therapy sessions 4

User Engagement and Therapeutic Alliance

The study reported high user engagement, with participants sending an average of 260 messages. Users rated their experience positively, with the app receiving high scores for ease of use and effectiveness. Notably, participants reported a degree of "therapeutic alliance" comparable to that experienced with human therapists 5

Advantages of AI-Assisted Therapy

24/7 availability: Users could access support at any time, including late at night when human therapists are unavailable.
Scalability: AI therapy could potentially help address the shortage of mental health professionals.
Immediate support: The chatbot provides real-time assistance during moments of distress 1
1
.

Limitations and Future Directions

While the results are promising, researchers emphasize that AI-powered therapy still requires human oversight. The study included safeguards such as monitoring for high-risk content and human interventions when necessary. Future research will focus on:

Expanding the range of treatable conditions
Improving safety measures and risk assessment
Integrating AI-assisted therapy with traditional mental health care 4
4

As the field of AI-assisted mental health care evolves, it holds the potential to complement traditional therapy and expand access to mental health support for millions of people worldwide.

References

Summarized by

Navi

[1]

MIT Technology Review

The first trial of generative AI therapy shows it might help with depression

[2]

MIT Technology Review

How do you teach an AI model to give therapy?

[3]

CNET

Can a Chatbot Be Your Therapist? A Study Found 'Amazing Potential' With the Right Guardrails

[4]

ScienceDaily

First therapy chatbot trial shows AI can provide 'gold-standard' care

[5]

Medscape

Mental Health AI Chatbot Rivals Human-Based Therapy in Less Time

Weekly Highlights

Today's Top Stories

Over 800 Public Figures Call for Ban on AI Superintelligence Development

A diverse group of more than 800 prominent individuals, including AI experts, celebrities, and political figures, have signed a statement urging a prohibition on the development of AI superintelligence until safety and public consensus are achieved.

26 Sources

Technology

16 hrs ago

Samsung Unveils Galaxy XR: A New Frontier in AI-Powered Extended Reality

Samsung launches the Galaxy XR headset, a $1,799 device that combines virtual and mixed reality with advanced AI features, challenging Apple's Vision Pro in the emerging XR market.

23 Sources

Technology

16 hrs ago

YouTube Launches AI Likeness Detection Tool to Combat Deepfakes

YouTube has officially rolled out its AI-powered likeness detection technology to eligible creators in the YouTube Partner Program. This tool aims to help creators identify and remove unauthorized AI-generated content featuring their likeness, addressing growing concerns about deepfakes and misinformation.

17 Sources

Technology

23 hrs ago

Google and Anthropic in Talks for Multibillion-Dollar Cloud Computing Deal

Google and AI startup Anthropic are negotiating a cloud computing partnership worth tens of billions of dollars. The deal would provide Anthropic with significant computing power for AI development, potentially reshaping the competitive landscape in the AI industry.

7 Sources

Business and Economy

16 hrs ago

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

The Outpost

News

About

AI Therapy Chatbot Shows Promise in First Clinical Trial for Depression and Anxiety

Breakthrough in AI-Assisted Mental Health Care

Therabot: A Custom-Built AI Therapist

Clinical Trial Results

User Engagement and Therapeutic Alliance

Advantages of AI-Assisted Therapy

Limitations and Future Directions

References

The first trial of generative AI therapy shows it might help with depression

How do you teach an AI model to give therapy?

Can a Chatbot Be Your Therapist? A Study Found 'Amazing Potential' With the Right Guardrails

First therapy chatbot trial shows AI can provide 'gold-standard' care

Mental Health AI Chatbot Rivals Human-Based Therapy in Less Time

Related Stories

AI Chatbots as Therapists: Potential Benefits and Serious Risks Revealed in Stanford Study

AI Therapy: Promising Potential and Ethical Concerns in Mental Health Care

AI Chatbots and Mental Health: Inconsistencies and Risks in Handling Suicide-Related Queries

Weekly Highlights

Tech Giants and Investment Firms Join Forces in $40 Billion AI Data Center Acquisition

OpenAI's Trillion-Dollar Gamble: Ambitious Plans and Financial Challenges in the AI Race

Google's $15 Billion AI Hub in India: A Game-Changer for Global AI Infrastructure

Weekly Highlights

Today's Top Stories

Over 800 Public Figures Call for Ban on AI Superintelligence Development

Samsung Unveils Galaxy XR: A New Frontier in AI-Powered Extended Reality

YouTube Launches AI Likeness Detection Tool to Combat Deepfakes

Google and Anthropic in Talks for Multibillion-Dollar Cloud Computing Deal