We introduce CRISPR-GPT, a solution that combines the strengths of LLMs with domain-specific knowledge, chain-of-thought reasoning, instruction fine-tuning, retrieval techniques and tools. CRISPR-GPT is centred around LLM-powered planning and execution agents (Fig. 1). This system leverages the reasoning abilities of general-purpose LLMs and multi-agent collaboration for task decomposition, constructing state machines and automated decision-making (Fig. 2a). It draws upon expert knowledge from leading practitioners and peer-reviewed published literature in gene editing for retrieval-augmented generation (RAG)13.
CRISPR-GPT supports four major gene-editing modalities and 22 gene-editing experiment tasks (Fig. 1 and Supplementary Table 1). It offers tunable levels of automation via three modes: Meta, Auto and Q&A. They are designed to accommodate users ranging from novice PhD-level scientists fresh to gene editing, to domain experts looking for more efficient, automated solutions for selected tasks (Fig. 1). The 'Meta mode' is designed for beginner researchers, guiding them through a sequence of essential tasks from selection of CRISPR systems, delivery methods, to designing gRNA, assessing off-target efficiency, generating experiment protocols and data analysis. Throughout this decision-making process, CRISPR-GPT interacts with users at every step, provides instructions and seeks clarifications when needed. The 'Auto mode' caters to advanced researchers and does not adhere to a predefined task order. Users submit a freestyle request, and the LLM Planner decomposes this into tasks, manages their interdependence, builds a customized workflow and executes them automatically. It fills in missing information on the basis of the initial inputs and explains its decisions and thought process, allowing users to monitor and adjust the process. The 'Q&A mode' supports users with on-demand scientific inquiries about gene editing.
To assess the AI agent's capabilities to perform gene-editing research, we compiled an evaluation test set, Gene-editing bench, from both public sources and human experts (details in Supplementary Note C). This test set covers a variety of gene-editing tasks (Fig. 1). By using the test set, we performed extensive evaluation of CRISPR-GPT's capabilities in major gene-editing research tasks, such as experiment planning, delivery selection, single guide (sg)RNA design and experiment troubleshooting. In addition, we invited human experts to perform a thorough user experience evaluation of CRISPR-GPT and collected valuable human feedback.
Further, we implement CRISPR-GPT in real-world wet labs. Using CRISPR-GPT as an AI co-pilot, we demonstrate a fully AI-guided knockout (KO) of four genes: TGFβR1, SNAI1, BAX and BCL2L1, using CRISPR-Cas12a in human lung adenocarcinoma cell line, as well as AI-guided CRISPR-dCas9 epigenetic activation of two genes: NCR3LG1 and CEACAM1, in a human melanoma model. All these wet-lab experiments were carried out by junior researchers not familiar with gene editing. They both succeeded on the first attempt, confirmed by not only editing efficiencies, but also biologically relevant phenotypes and protein-level validation, highlighting the potential of LLM-guided biological research.
CRISPR-GPT is a multi-agent, compositional system involving a team of LLM-based agents, including an LLM Planner agent, a User-proxy agent, Task executor agents and Tool provider agents (Fig. 2a). These components are powered by LLMs to interact with one another as well as the human user. We also refer to the full system as an 'agent' to encapsulate the overall functionalities.
To automate biological experiment design and analysis, we view the overall problem as sequential decision-making. This perspective frames the interaction between the user and the automated system as a series of decision-making steps, each essential for progressing towards the ultimate goal. Take the Auto mode for example. A user can initiate the process with a meta-request, for example, "I want to knock out the human TGFβR1 gene in A549 lung cancer cells". In response, the agent's LLM Planner will analyse the user's request, drawing on its extensive internal knowledge base via retrieval techniques. Leveraging the reasoning abilities of the base LLM, the Planner generates a chain-of-thought reasoning path and chooses an optimal action from a set of plausible ones while following expert-written guidelines. Consequently, the Planner breaks down the user's request into a sequence of discrete tasks, for example, 'CRISPR-Cas system selection' and 'gRNA design for knockout', while managing interdependencies among these tasks. Each individual task is solved by an LLM-powered state machine, via the Task executor, entailing a sequence of states to progress towards the specific goal. After the meta-task decomposition, the Task executor will chain the state machines of the corresponding tasks together into a larger state machine and begin the execution process, systematically addressing each task in sequence to ensure that the experiment's objectives are met efficiently and effectively (Fig. 2a).
The User-proxy agent is responsible for guiding the user throughout the decision-making process via multiple rounds of textual interactions (typical user interactions required by each task detailed in Supplementary Table 2). At each decision point, the internal state machine presents a 'state variable' to the User-proxy agent, which includes the current task instructions, and specifies any necessary input from the user to proceed. The User-proxy agent then interprets this state given the user interactions and makes informed decisions as input to the Task executor on behalf of the user. Subsequently, the User-proxy agent receives feedback from the Task executor, including the task results and the reasoning process that led to those outcomes. Concurrently, the User-proxy agent continues to interact with the user and provides them with instructions, continuously integrating their feedback to ensure alignment with the user's objectives (detailed in Methods; Fig. 2a and Supplementary Fig. 1).
To enhance the LLM with domain knowledge, we enable the CRISPR agent to retrieve and synthesize information from published protocols, peer-reviewed research papers and expert-written guidelines, and to utilize external tools and conduct web searches via Tool provider agents (Fig. 2a).
For an end-to-end gene-editing workflow, CRISPR-GPT typically constructs a chain of tasks that includes selecting the appropriate CRISPR system, recommending delivery methods, designing gRNAs, predicting off-target effects, selecting experimental protocols, planning validation assays and performing data analysis (Fig. 2b). The system's modular architecture facilitates easy integration of additional functionalities and tools. CRISPR-GPT serves as a prototype LLM-powered AI co-pilot for scientific research, with potential applications extending beyond gene editing.
CRISPR-GPT is able to automate gene-editing research via several key functionalities. For each functionality we discuss the agentic implementation and evaluation results.
The Task planner agent is charged with directing the entire workflow and breaking down the user's meta-request into a task chain (Fig. 2b). While the Planner selects and follows a predefined workflow in the Meta mode, it is able to take in freestyle user requests and auto-build a customized workflow in the Auto mode. For example, a user may only need part of the pre-designed workflow including CRISPR-Cas system selection, delivery method selection, gRNA design and experimental protocol selection before the experiment. Then the Task planner agent extracts the right information from the user request and assembles a customized workflow to suit user needs (Fig. 3a). To evaluate CRISPR-GPT's ability to correctly layout gene-editing tasks and manage intertask dependence, we compiled a planning test set, as part of the Gene-editing bench, with user requests and golden answers curated by human experts. Using this test set, we evaluated CRISPR-GPT in comparison with prompted general LLMs, showing that CRISPR-GPT outperforms general LLMs in planning gene-editing tasks (Fig. 3b). The CRISPR-GPT agent driven by GPT-4o scored over 0.99 in accuracy, precision, recall and F1 score, and had <0.05 normalized Levenshtein distance between agent-generated plans and golden answers (Fig. 3b). For extensive description of the test set and evaluation, please see Supplementary Note C1.
We present and evaluate the delivery agent of CRISPR-GPT (Fig. 4a,b). Delivery is a critical step for all gene-editing experiments. CRISPR-GPT equips LLM with expert-tailored instructions and external tools to choose delivery methods. Specifically, the agent first tries to understand the biological system that the user is planning to edit. It extracts keywords for the target cell/tissues/organisms, performs Google search and summarizes the results. Then, given its own knowledge and search results, CRISPR-GPT matches the user case with a major biological category (cell lines, primary cells, in vivo and so on) which reduces the possible options to a focused set of candidate methods. Next, CRISPR-GPT performs literature search with user and method-specific keywords, and ranks the candidate methods on the basis of citation metrics to suggest a primary and a secondary delivery method (Fig. 4a). To evaluate the performance of this module, we compiled test cases including 50 biological systems as part of the Gene-editing bench. For each case, we invited three human experts to score potential delivery options and utilized those as ground truth. We then evaluated the output of CRISPR-GPT and baseline models by comparing to the pre-compiled ground-truth score sheet. We found that CRISPR-GPT outperforms the baseline GPT-4 and GPT-3-turbo models (Fig. 4b). The agent has a substantial edge on difficult tasks such as those involving hard-to-transfect cell lines and primary cell types. We also noticed that including an additional literature search step improves the agent's performance only moderately. More details about the delivery selection evaluation can be found in Supplementary Note C2.
Good gRNA design is crucial for the success of CRISPR experiments. Various gRNA design tools and software, such as CRISPick and ChopChop, are available. However, we believe there are two key challenges in general usage: (1) choosing a trustworthy source and (2) difficulty in quickly identifying gRNAs that suit specific user requirements or experiment contexts, often requiring lengthy sorting, ranking or literature review. To address these issues, we utilized pre-designed gRNA tables from CRISPick, a reputable and widely used tool. We leverage the reasoning capabilities of LLMs to accurately identify regions of interest and quickly extract relevant gRNAs. This approach is similar to the recently proposed 'chain-of-tables' methodology (Fig. 4c, Extended Data Fig. 1a, and Supplementary Demo Videos 1 and 2). To evaluate the ability of CRISPR-GPT to correctly retrieve gRNAs, we compiled a gRNA design test set with ground truth from human experts (detailed in Supplementary Note C3). CRISPR-GPT agent outperforms the baseline LLMs in accurately selecting gRNA design actions and configuring the arguments (Fig. 4d).
Further, we picked a real-world test case from a cancer biology study, in which many highly ranked gRNA designs did not generate biological phenotypes, even when their editing efficiencies were high. Instead, the authors of the study had to design gRNAs manually against exons encoding important functional domains within a gene, and exon-selected gRNAs induced expected cancer-killing effects. We tested CRISPR-GPT for designing gRNAs targeting BRD4 gene from this study, and compared results with those generated by CRISPick and CHOPCHOP (Extended Data Fig. 1). CRISPR-GPT was uniquely able to select the key exons, Exon3-4, within BRD4. In contrast, gRNAs designed by CRISPick or CHOPCHOP would be likely ineffective, as 7 out of 8 gRNAs mapped to non-essential exons (Extended Data Fig. 1). Taken together, our results support the benefit and validity of this module.
CRISPR-GPT provides specific suggestions on the choice of the CRISPR system, experimental and validation protocol by leveraging LLM's reasoning ability and retrieving information from an expert-reviewed knowledge base. It also offers automated gRNA off-target prediction, primer design for validation experiments, and data analysis. In particular, the agent provides fully automated solutions to run external software, such as Primer3 (ref. ), CRISPRitz and CRISPResso2 (ref. ) (Supplementary Table 1). We focused on implementing these tools as they are considered gold standard in respective tasks and have been extensively validated in previous work.
General-purpose LLMs may possess broad knowledge but often lack the deep understanding of science needed to solve research problems. To enhance the CRISPR-GPT agent's capacity in answering advanced research questions, we build a Q&A mode that synthesizes information from multiple resources, including published literature, established protocols and discussions between human scientists, utilizing a combination of RAG technique, a fine-tuned specialized model and a general LLM (for which we picked GPT-4o; Methods).
To enhance the Q&A mode's capacity to 'think' like a scientist for problem solving, we sought to train a specialized language model using real scientific discussions among domain experts. The fine-tuned model is used as one of the multiple sources of knowledge for the Q&A mode (Fig. 4e). To this end, we collected 11 years of open-forum discussions from a public Google Discussion Group on CRISPR gene-editing, starting from 2013 (Supplementary Note B). The discussion group involved a diverse cohort of scientists worldwide. This dataset, comprising ~4,000 discussion threads, was curated into an instructional dataset with over 3,000 question-and-answer pairs (Supplementary Note B). Using this dataset, we fine tuned an 8-billion-parameter LLM on the basis of the Llama3-instruct model. The fine-tuned model, which we call CRISPR-Llama3, has improved abilities in gene-editing questions, outperforms the baseline model on basic questions by a moderate 8% and on real-world research questions by ~20% (Supplementary Figs. 2 and 3). We integrate this fine-tuned LLM into the Q&A mode as a 'brainstorming source', enabling the agent to generate ideas like a human scientist and provide a second opinion for difficult queries (Fig. 4e).
To assess the performance of the Q&A mode, we used the Gene-editing bench Q&A test set (Supplementary Note C). The test questions encompass basic gene-editing knowledge, experimental troubleshooting, CRISPR application in various biological systems, ethics and safety. We prompted CRISPR-GPT, GPT-3.5-turbo and GPT-4o to generate responses to test questions. Three human experts scored the answers in a fully blinded setting. The test demonstrated that the Q&A mode outperformed baseline LLMs in accuracy, reasoning and conciseness, with improvement of 12%, 15% and 32%, respectively, versus GPT-4o (Fig. 4f). Human evaluators observed that general-purpose LLMs sometimes make factual errors and tend to provide extensive answers not all of which are relevant to the questions. For example, one question was about solving cell growth issues in an experiment where a scientist performed Cas9 editing followed by single-cell sorting using MCF-7 cells. For this question, the Q&A mode provided a concise, accurate summary of potential reasons and actionable solutions. In contrast, GPT-4o responded with a long list of 9 factors/options, but at least 2 of them were not applicable to MCF-7 cells (Extended Data Fig. 2). This and other examples (Extended Data Figs. 3 and 4) showcase the advantage of the CRISPR-GPT Q&A mode. Overall, evaluation results confirmed that the multisource Q&A mode is better at answering advanced research questions about gene editing.
To further evaluate the human user experience of CRISPR-GPT, we assembled a panel of eight gene-editing experts to assess the agents' performance for end-to-end experiment covering all 22 individual tasks (Supplementary Table 2 and 3 demos). The experts were asked to rate their experiences in four dimensions: accuracy, reasoning and action, completeness and conciseness (see Supplementary Note C for detailed rubrics). CRISPR-GPT demonstrated improved accuracy and strong capabilities in reasoning and action, whereas general LLMs, such as GPT-4o, often included errors and were prone to hallucination (Fig. 5a,b).
Highlighted by human evaluators' observations (Fig. 5c), the CRISPR-GPT agent provides users with more accurate, concise and well-rounded instructions to execute the planned experiments. The ability of CRISPR-GPT to perform specialty gene-editing tasks, such as exon-selected gRNA design, customized off-target prediction and automated sequencing data analysis, reinforced its advantage versus general-purpose LLMs. This is confirmed by the task-specific evaluation results (Fig. 5b). Despite its strengths, CRISPR-GPT struggled with complex requests and rare biological cases, highlighting areas for improvement (limitations in Supplementary Note D).
To showcase and validate CRISPR-GPT's ability as an AI co-pilot to general biologists, we enlisted two junior researchers unfamiliar with gene editing. They used CRISPR-GPT in two real-world experiments: (1) designing and conducting a multigene knockout and (2) epigenetic editing, from scratch.
In the first experiment, the junior researcher conducted gene knockouts in the human A549 lung adenocarcinoma cell line, targeting four genes involved in tumour survival and metastasis: TGFβR1, SNAI1, BAX and BCL2L1 (Fig. 6). The experiment was designed from scratch with CRISPR-GPT (Fig. 6a). On the basis of the user-AI interaction, enAsCas12a was selected for its multitarget editing capability and low off-target effects. For delivery, CRISPR-GPT recommended lentiviral transduction for stable Cas and gRNA expression. The gRNAs for the four target genes were designed through CRISPR-GPT. Furthermore, CRISPR-GPT provided step-by-step protocols for gRNA cloning, lentivirus production and viral delivery into A549 cells. To validate the editing, the researcher followed CRISPR-GPT's next-generation sequencing (NGS) protocol, using assay primers designed via the integrated Primer3 tool. After generating the NGS data, the raw sequencing files were uploaded into CRISPR-GPT for automated analysis through the CRISPResso2 pipeline. The analysis reports, sent directly via email, summarized the editing outcomes and showed consistently ~80% high efficiency across all target genes (Fig. 6b, Supplementary Demo Video 3, user interactions summarized in Supplementary Table 4, full chat history listed in Supplementary Table 2). To further assess the biological phenotypes of TGFβR1 and SNAI1 knockout in A549 cells, the researcher conducted an epithelial-mesenchymal transition (EMT) induction experiment by treating A549 cells with TGFβ (Fig. 6c and Methods). The qPCR results revealed that the knockout A549 cell lines (A549 TGFβR1 KO and A549 SNAI1 KO) showed up to 9-fold reduction in CDH1 expression change and up to 34-fold reduction in VIM expression change, which are both key marker genes in the EMT process. This confirms the biological role of TGFβR1 and SNAI1 signalling in driving EMT progression (a crucial driver of metastasis) in lung cancer cells (Fig. 6d).
In the second experiment, the junior researcher performed epigenetic editing to activate two genes involved in cancer immunotherapy resistance in a human melanoma model cell line (Fig. 6e, user interactions summarized in Supplementary Table 4, full chat history listed in Supplementary Table 2). CRISPR-GPT guides the researcher through the full workflow: identifying the most suitable CRISPR activation system, selecting an appropriate delivery method for A375 cells, designing dCas9 gRNAs (three gRNAs per gene) and generating protocols for validating editing outcomes. After editing was completed, measurements of target protein expression level confirmed successful activation of both genes, with up to 56.5% efficiency for NCR3LG1, and 90.2% efficiency for CEACAM1, when comparing gRNA-edited groups vs negative control gRNAs (Fig. 6f).
Overall, CRISPR-GPT enabled successful completion of the first set of AI-guided gene-editing experiments. Interactions between the researchers and LLM-powered agents led to efficient, accurate and ethically mindful gene-editing on the first attempt, even by users new to the technique.