Can artificial intelligence improve clinical trial design? Despite their importance in medicine, over 40% of trials involve flawed protocols. We introduce and propose the development of application-specific language models (ASLMs) for clinical trial design across three phases: ASLM development by regulatory agencies, customization by Health Technology Assessment bodies, and deployment to stakeholders. This strategy could enhance trial efficiency, inclusivity, and safety, leading to more representative, cost-effective clinical trials.
As the primary tool for validating new treatments, clinical trials are integral to evidence-based medicine and medical progress, playing a pivotal role in determining the safety and efficacy of new therapeutic interventions, including drugs, medical devices, and clinical procedures. But despite their central role and the vast resources devoted to them, studies from Cochrane reviews -- widely regarded as the gold standard -- indicate that over 40% of these trials are 'wasted' due to suboptimal protocols, a situation deemed 'ethically, scientifically, and economically indefensible'. Three of the most common deficiencies concern blinding of patients and personnel, blinding of outcome assessment and incomplete outcome data (see Table 1 for a comprehensive overview of deficiencies). Although initiatives such as enhanced training and targeted funding have been proposed to overcome deficiencies, tangible progress remains insufficient. Can artificial intelligence (AI) help address some of these enduring challenges?
Recent advancements in generative AI, particularly in large language models (LLMs) with 'chain of thought' capabilities -- that is, a reasoning approach that involves generating intermediate steps to solve complex problems, similar to how humans break down tasks or reason through a problem iteratively -- offer a promising avenue for improving clinical trial design. Early studies suggest that existing LLMs can identify the risk of bias in published randomized controlled trials (RCTs), as well as draft RCT protocol sections, at levels similar to human performance. Frontier AI models have recently been reported to exceed human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA), place among the top entries for the USA Math Olympiad (AIME) qualifier, and rank in the 89% percentile on competitive programming challenges (Codeforces).
In this Comment, we propose two improvements to harness AI's potential in enhancing clinical trial design. First, we introduce the concept of application-specific language models (ASLMs) for clinical trials. By 'application-specific language models,' we mean either LLMs fine-tuned on domain-specific data and enhanced with retrieval-augmented generation (RAG) capabilities, or purpose-built smaller language models designed specifically for clinical trial improvement. We argue that ASLMs can significantly outperform general-purpose LLMs in this specialized domain.
Second, we propose a three-step policy approach for developing and implementing these ASLMs (Fig. 1):