This Review explores the state-of-the-art applications of AI in small-molecule drug development since 2019; for insights into research conducted before 2019, readers are encouraged to consult previous comprehensive reviews2,24,25. For more detailed information on AI for natural product drug discovery, please refer to the latest review20. In this Review, we first describe AI-powered drug discovery, from target identification up to synthesis planning, and AI applications within clinical stages of drug development -- including biomarker discovery, drug repurposing, prediction of pharmacokinetic properties and toxicity, and clinical trial conduct. Finally, we discuss the challenges faced by AI-powered drug development and outline future directions for the field. We hope to illuminate a new era of innovation, efficiency and precision in drug development that is expected to expedite delivery of new and better medicines to patients.
In recent years, AI has emerged as a transformative force in the field of drug discovery, revolutionizing traditional methodologies and enhancing efficiency across multiple stages of the process. This section will explore the profound impact of AI on various aspects of drug discovery, including target identification, virtual screening, de novo design, ADMET (absorption, distribution, metabolism, excretion and toxicity) predictions, and synthesis planning and automating synthesis and drug discovery. By leveraging advanced algorithms and techniques, researchers are now able to accelerate the discovery of novel therapeutic agents, improve the accuracy of predictions and reduce the overall time and costs associated with drug development.
The identification of small-molecule targets, such as proteins or nucleic acids, is a critical process in drug discovery. Traditional methods such as affinity pull-down and whole-genome knockdown screening are widely used, but tend to be time consuming and labor intensive, with high failure rates.
Advances in AI technology are revolutionizing this field by enabling the analysis of large datasets within complex biological networks. AI facilitates the identification of disease-related molecular patterns and causal relationships by constructing multi-omics data networks, thus facilitating the discovery of candidate drug targets. For example, recent research uses NLP techniques (such as word2vec embeddings) to map gene functions into high-dimensional space, enhancing the sensitivity of target identification despite the sparsity of gene function overlap. Nonetheless, integrating multi-omics data efficiently and ensuring the interpretability of AI models are challenging tasks. Graph deep learning technology addresses these by merging graph structures with deep learning, focusing on graph nodes related to key features (for example, atom type, charge) to effectively identify candidate targets. A recent study successfully developed an interpretable framework using multi-omics network graphs with graph attention mechanisms to predict cancer genes effectively.
Furthermore, integrating multi-omics data with scientific and medical literature into knowledge graphs allows AI to discern relationships between genes and disease pathways. Biomedical LLMs, when deeply integrated with biological networks or knowledge graph functions, provide efficient and precise methods for linking diseases, genes and biological processes. For instance, the PandaOmics platform (https://pharma.ai/pandaomics/) successfully utilized multi-omics data and biological network analysis to recognize TRAF2- and NCK-interacting kinase as a potential target for anti-fibrotic therapy, leading to the development of a specific TRAF2- and NCK-interacting kinase inhibitor (INS018_055). However, the potential publication biases in the literature suggest a need for supplementary methods to ensure the identification of novel and relevant targets.
Real-world data, such as medical records, self-reports, electronic health records (EHRs) and insurance claims, provide essential contextual information for understanding complex diseases and facilitating target discovery. However, real-world data often contain unstructured text, lack standardization and may include biases, limiting their application in this context. While high-quality, curated datasets are crucial for training models, real-world data are inherently noisy and complicated by the confluence of multiple diseases. Nonetheless, recent studies have shown that despite these issues, noisy real-world data can train effective models, advancing the potential for gene discovery and candidate drug targets in noisy medical record and non-expert disease labeling scenarios. Enhancing model generalizability across diverse populations remains a major challenge, especially for diseases with low labeling or prevalence rates. As real-world and multi-omics data grow richer, utilizing advanced data mining algorithms and expert knowledge will further enhance their integration, significantly improving the success rate of target discovery.
Virtual screening is a critical strategy for efficiently identifying potential lead compounds or drug candidates. The rapid expansion of compound libraries necessitates accelerated virtual screening of ultra-large libraries, prompting advancements in AI technologies for ligand docking. AI-based receptor-ligand docking models can predict ligand spatial transformations, directly generate complex atomic coordinates using algorithms like equivariant neural networks and learn the probability density distribution of receptor-ligand distances to generate binding poses. Notably, recent receptor-ligand co-folding networks based on AlphaFold2 and RosettaFold show promise in predicting complex structures directly from sequence information. However, they may produce unrealistic ligand conformations due to insufficient learning of physical constraints, necessitating post-processing (for example, energy minimization) or geometric constraints to optimize docking pose validity. However, deep learning-based binding pose prediction models have yet to outperform physics-based methods in pocket-oriented docking tasks, and they often inadequately consider receptor pocket flexibility. Additionally, predicting precise receptor-ligand interaction remains a challenge. While early machine learning successes in affinity prediction have sparked interest in deep learning models, and these models may outperform traditional scoring functions by handling both three-dimensional structural and nonstructural data, their performance heavily depends on ligand pose accuracy and is primarily suitable for known receptor structures.
When target structures are absent or incomplete, the direct application of docking-based virtual screening is impractical. As an alternative, AI techniques may be used in sequence-based prediction methods. However, such methods often struggle to capture the complexity of three-dimensional protein-ligand interactions, complicating accurate predictions of how binding pose changes affect interaction strength.
While targeted drug development is effective for defined targets, many diseases lack such targets. Phenotype-based virtual screening is thus crucial for diseases with undefined targets (for example, rare diseases) and broadly phenotypic diseases (for example, aging). A recent study used nuclear morphology and machine learning to identify compounds inducing senescence in cancer cells; similar strategies are also promising for antibiotic discovery. However, such models often depend on case-specific phenotypic data and struggle with generalization. Furthermore, AI-based activity prediction solely relying on ligand chemical structures faces challenges like data sparsity and imbalance, and activity cliffs. Recent studies suggest that integrating related biological information like cell morphology and transcriptional profiles can enhance model performance, offering a new direction for more accurate activity prediction.
Current virtual screening models generally focus on specific tasks such as scoring, pose optimization or screening, emphasizing the need to develop universal models capable of handling multiple tasks. Incorporating inductive biases (which refer to the model's inherent tendency to prioritize certain types of solutions over others) or data augmentation (which refers to techniques used to artificially expand the diversity of a training dataset without collecting new data) might improve model generalizability. Furthermore, the exponential growth of commercial compound collections to billions makes comprehensive screening computationally infeasible. Meanwhile, the available molecular libraries cover only a small portion of the druggable chemical space, which continues to expand -- creating both opportunities and challenges in navigating and screening for bioactive molecules.
In response to these challenges, techniques like active learning and Bayesian optimization are effective methods for addressing the chemical space search problem, becoming key to enhancing virtual screening efficiency. The integration of quantum mechanics with AI offers new tools for chemical space exploration, while molecular dynamics simulations add depth to protein-ligand interactions, addressing issues of binding affinity and selectivity to improve model accuracy. Simultaneously, by generating custom virtual libraries for specific targets or compound types, deep generative models substantially narrow search spaces and enhance screening efficiency. For instance, our conditional recurrent neural network generated a custom library that identified an efficient and selective RIPK1 inhibitor in cell and animal models.
De novo drug design involves autonomously creating new chemical structures to optimally satisfy desired molecular features. Traditional methods, including structure-based, ligand-based and pharmacophore-based designs, are manual and rely on expert designers and explicit rules. AI, particularly deep learning, has enabled the automated identification of novel structures that meet specific requirements, bypassing traditional expertise. This technology has been successfully applied in developing small-molecule inhibitors, PROTACs, peptides and functional proteins that are validated through wet-lab experiments, ushering in a more efficient and innovative drug discovery era.
In deep learning-driven de novo design (Fig. 2), the molecular generation component is central, normally using chemical language or graph-based models. Chemical language models convert molecular generation tasks into sequence generation such as SMILES string ('simplified molecular input line entry system', a notation system that represents a chemical structure in a linear text format). Although extensive pretraining is required and may produce invalid SMILES due to syntactic errors, these errors can aid model self-correction by filtering improbable samples. Models like long short-term memory models (a type of deep learning model that analyzes sequential data) face information compression bottlenecks, hindering the learning of global sequence attributes, which suggests a need for architectures such as Transformers to better capture global properties. Recent research integrates structured state-space sequences into chemical language models to reveal high chemical space similarity and alignment with key natural product design features, proving the model's utility in de novo design.
Conversely, graph-based models represent molecules as graphs, generating structures using autoregressive or non-autoregressive strategies. Autoregressive approaches construct molecules atom by atom, which can lead to chemically implausible intermediates and introduce bias. In contrast, non-autoregressive methods generate entire molecular graphs at once but need extra steps to ensure the graph's validity, as these models' limited perception of molecular topological structures can induce flawed structures.
Given the vastness of the drug-like chemical space, de novo generation often guides design toward target features, using optimization mechanisms such as scoring functions based on metrics including similarity to known active molecules and predicted bioactivity. Incorporating reinforcement learning for iterative optimization is an effective approach, yet designing appropriate scoring functions is challenging as directly quantifying objectives like synthetic feasibility or drug likeness is difficult, often leading to unintended consequences. Furthermore, reinforcement learning's extensive optimization steps highlight challenges in sample efficiency, which active or curriculum learning strategies may mitigate.
Beyond introducing scoring functions, incorporating constraints -- such as disease-related gene expression features, pharmacophores, protein sequences or structures, binding affinity and protein-ligand interactions -- can also direct models toward generating desired molecules. For instance, our PocketFlow model, conditioned on protein pockets, effectively generated experimentally validated active compounds against HAT1 and YTHDC1 targets, showcasing its drug design capabilities. Additionally, models can refine leads by restricting outputs to specific scaffolds or fragments from desired candidates, albeit at the cost of limiting chemical diversity.
ADMET plays a critical role in determining drug efficacy and safety. While wet-lab evaluations are required for market approval and cannot be fully replaced by simulations, early-stage ADMET predictions can help reduce failures due to poor characteristics. AI has emerged as a valuable tool for predicting ADMET properties using predefined features like molecular fingerprints or descriptors. For instance, Bayer's in silico ADMET platform uses machine learning techniques such as random forest and support vector machines, using descriptors like circular extended connectivity fingerprints to ensure accuracy and relevance. Over the past decades, various descriptors for ADMET predictions have been developed. However, feature engineering involved in these feature-based methods remains complex and limits generality and flexibility.
Deep learning now drives ADMET prediction, automatically extracting meaningful features from simple input data. Various neural network architectures, including transformers (designed to effectively handle sequential data), convolutional neural networks (a type of deep learning model commonly used for image and video recognition tasks) and, more recently, graph neural networks (deep learning models for processing graph-structured data, such as molecular structures), excel in modeling molecular properties from formats such as SMILES strings and molecular graphs. Among them, SMILES strings offer compact molecular representation and can distinctly express substructures like branches, rings and chirality, but lack topological awareness -- whereas graph neural networks (like the GeoGNN model) incorporate geometric information, providing superior performance in ADMET prediction. Indeed, a recent study indicates that transformer models using SMILES input struggle with complete structure recognition. For predictions involving properties like toxicity, the performance of representations generated by these models might saturate before training progresses, showing limited improvement after training.
Despite the advances propelled by novel deep learning algorithms, the field still faces challenges. High costs and considerable time investments lead to scarce labeled data in ADMET predictions, leading to potential overfitting. Unsupervised and self-supervised learning offers solutions, and while large transformer-based models show promise in other fields, their use in ADMET prediction remains underexplored. A recent study indicates that although SMILES language does not encode molecular topology directly, carefully designed self-supervised training with contextual transformers equipped with linear attention mechanisms can effectively learn implicit structure-property relationships, bolstering confidence in applying large-scale self-supervised models for ADMET predictions.
Furthermore, molecular representation is critical for AI performance. High-dimensional representations typically provide richer information than low-dimensional ones. However, recent studies indicate that integrating multiple levels of molecular representation can substantially enhance learning, leading to more comprehensive, generalizable and robust ADMET prediction. This suggests that multimodal ADMET models using multiple representations simultaneously holds promise, although the optimal combination of data types is still unresolved.
Interpretability remains a major challenge. Understanding model parameters in ADMET predictions helps reveal the relationships between molecular substructures and properties. Attention mechanisms, which allow a model to focus on important parts of the input data, can enhance interpretability by identifying key atoms or groups. Integrating chemical knowledge can further enhance interpretability, but expanding models to achieve comprehensive chemical understanding remains challenging.
Chemical synthesis, one of the bottlenecks in small-molecule drug discovery, is a highly technical and extremely laborious task. Computer-aided synthesis planning (CASP) and automatic synthesis of organic compounds can help alleviate the burden of repetitive laborious tasks for chemists, enabling them to engage in more innovative works. With the rapid development of AI, the pharmaceutical industry and academia are becoming increasingly interested in achieving intelligence and automation in this process.
CASP has been used as a tool to assist chemists in determining reaction routes via retrosynthesis analysis, a problem-solving technique in which target molecules are recursively transformed into increasingly simpler precursors (Fig. 3a). Early CASP programs were rule based (for example, logic and heuristics applied to synthetic analysis, simulation and evaluation of chemical synthesis and retrosynthesis-based assessment of synthetic accessibility programs). Since then, a range of machine learning techniques, particularly deep learning models, have been developed -- yielding gradual improvements in the synthesis planning of artificial small molecules and natural products. Recently, the transformer model has also been applied to retrosynthetic analysis, prediction of regioselectivity (the preference of a chemical reaction to occur at one particular location over another on a molecule with multiple possible reactive sites) and stereoselectivity (the preference of a reaction to produce one stereoisomer over another when multiple stereoisomeric products are possible and reaction fingerprint extraction. Concerns regarding the adequacy of purely data-driven AI methods for complex synthesis planning have spurred the development of hybrid expert-AI systems that incorporate chemical rules. Most current deep learning approaches, however, are unexplainable, showing as 'black boxes' that offer limited insights. To tackle this challenge, a new retrosynthesis prediction model, RetroExplainer, was recently introduced with an interpretable deep learning framework that reframes the retrosynthesis task as a molecular assembly process. RetroExplainer has shown superior performance compared to state-of-the-art retrosynthesis methods. Notably, its molecular assembly approach enhances interpretability, enabling transparent decision-making and quantitative attribution.
Automated synthesis of organic compounds represents a cutting-edge frontier in the field of chemistry-related fields (Fig. 3b), including medicinal chemistry. An optimal automated synthesis platform would seamlessly integrate and streamline various components of the chemical development process, including CASP as well as automated experiment setup and optimization, and robotically executed chemical synthesis, separation and purification. Recently, deep learning-powered automated flow chemistry and solid-phase synthesis techniques for pharmaceutical compound synthesis have gained considerable attention. In particular, automated synthesis combined with designing, testing and analyzing technologies forms an automated central process of drug discovery called the design-make-test-analysis (DMTA) cycle. By leveraging deep learning, the efficiency of the DMTA cycle has been substantially improved, accelerating the discovery of hit and lead compounds for drug discovery. For example, by using an AI-powered DMTA platform with deep learning for molecular design and microfluidics for on-chip chemical synthesis, liver X receptor agonists were generated from scratch. In addition, LLMs are believed to 'understand' human natural language, enabling automation platforms to provide tailored solutions for specific challenges based on concise inputs from researchers. Although automated synthesis and the automated DMTA cycle have great prospects, their development is still in the infancy stage. Many technical challenges remain, including requirements to reduce solid formation to avoid blockage, predict solubility in nonaqueous solvents and at different temperatures, estimate optimal purification methods and optimize multistep reactions.
Following the planning and synthesis of new drug compounds, AI technology facilities the in vivo validation of the mechanism of action (MOA) of new drugs. In high-content screening, by monitoring the real-time changes in omics data, AI technologies would generalize these features and develop a model that is capable of deciphering the molecular and cellular MOA of a new compound and its associated pharmacokinetics, pharmacodynamics, toxicology and bioavailability properties (Fig. 4).