2 Sources
[1]
Teams of AI agents boost speed of research
Artificial intelligence is poised to take on a more-active role in the laboratory: two new systems, described today in Nature, use teams of AI agents to develop hypotheses, propose experiments and analyse data. Each system still relies on human input at various stages, but they boast timelines that can be remarkably shorter than when the process is left to human minds and hands alone. When the systems were asked to identify existing drugs that might be repurposed for different conditions, they arrived at plausible answers in a matter of hours. "It almost seems like an agentic, in silico implementation of the thought process in a scientist's head," says Vivek Natarajan, a researcher at Google DeepMind in Mountain View, California, who helped to develop one of the systems. "The goal is to give scientists superpowers." In one experiment, Natarajan and his colleagues used Google's Co-Scientist to look for approved drugs that could be repurposed to treat a form of blood cancer called acute myeloid leukaemia. The system identified a list of candidate drugs, from which human researchers selected five for further study. Three of these showed promise in preliminary studies on cells grown in the lab. FutureHouse, a non-profit AI research lab in San Francisco, California, developed the second system, called Robin, and instructed it to find drugs to treat an eye condition called dry age-related macular degeneration. Robin began by consulting AI agents trained to conduct literature reviews and used their reports to select lab experiments to test a variety of candidate drugs. Humans carried out those experiments and fed the data back to Robin, which then supplied them to an AI agent specialized in analysing data. Using this procedure, Robin suggested a list of molecular targets for treating dry age-related macular degeneration and identified a drug called ripasudil, which is used to treat the eye condition glaucoma, as a candidate treatment. The system suggested assays to confirm ripasudil's activity in the lab and then proposed follow-up experiments. None of the drugs identified by the AI scientists have been fully evaluated, and many drug candidates that pass initial assays in lab-grown cells go on to fail more-stringent assays. But the examples show that these AI systems can arrive at plausible hypotheses, says Karandeep Singh, who oversees AI initiatives and strategy for University of California San Diego Health. How well the AI assistants perform in day-to-day science in other contexts remains to be seen, he adds. "You don't know how it works in reality until it's been made available to a broad set of people," he says. Natarajan says that about 100 scientists outside Google DeepMind now have access to Co-Scientist and are testing its capabilities in a variety of settings. In one experiment described in the paper, Natarajan and his colleagues asked Co-Scientist to develop a hypothesis that could explain why a particular suite of genes conferring antimicrobial resistance could be found in multiple bacterial species. The system took only days to arrive at the same conclusion as that of a group of researchers who had been studying the phenomenon and had not yet published the results. Although AI agents can perform some tasks much more quickly than humans can, researchers could lose time and money if AI leads them down dead ends. Both Robin and Co-Scientist are based on large-language models, a form of AI that is prone to occasionally producing false but plausible-sounding answers. These 'hallucinations' will probably always be a concern when dealing with this form of AI, says Ola Spjuth, who studies the use of AI for drug discovery at Uppsala University in Sweden. But cutting-edge AI models hallucinate less than their predecessors do, he notes, and a researcher using an AI co-scientist system can audit its decision-making process to learn why it made the choice that it did. Furthermore, Robin and Co-Scientist include steps in which AI agents debate hypotheses or compare results between themselves, potentially weeding out some hallucinations and faulty reasoning. Even so, human oversight is key, Spjuth says. "We cannot just delegate important decisions right now to LLMs and AI agents," he says. "We need to supervise these methods." The role for humans in research could be shifting in other ways as well. Companies are progressing rapidly towards sophisticated robots that can carry out some lab work. And today, a team of Google researchers report, in Nature, an agentic AI system called Empirical Research Assistance that aims to write high-quality software, and has been deployed in fields as diverse as cosmology and neuroscience. The extent to which AI tools can take over hypothesis generation and data interpretation might depend on the nature of the research, says Samuel Rodriques, chief executive and co-founder of FutureHouse. In terms of drug discovery, "there's a huge way to go" before AI can design a new drug all the way from the initial target through to clinical testing, he says. "But I think it's possible."
[2]
Two AI-based science assistants succeed with drug-retargeting tasks
On Tuesday, Nature released two papers describing AI systems intended to help scientists develop and test hypotheses. One, Google's Co-Scientist, is designed as what they term "scientist in the loop," meaning researchers are regularly applying their judgements to direct the system. The second, from a nonprofit called FutureHouse, goes a step beyond and has trained a system that can evaluate biological data coming from some specific classes of experiments. While Google says its system will also work for physics, both groups exclusively present biological data, and largely straightforward hypotheses -- this drug will work for that. So, this is not an attempt to replace either scientists or the scientific process. Instead, it's meant to help with the things that current AIs are best at: chewing through massive amounts of information that humans would struggle to come to grips with. What's this good for? There are some distinctions between the two systems, but both of them are what is termed agentic; they operate in the background by calling out to separate tools. (Microsoft has taken a similar approach with its science assistant as well; OpenAI seems to be an exception in that it simply tuned an LLM for biology.) And, while there are differences between them that we'll highlight, they are both focused on the same general issue: the utter profusion of scientific information. With the ease of online publishing, the number of journals has exploded, and with them the number of papers. It has gotten tough for any researcher to stay on top of their field. Finding potentially relevant material in other fields is a real challenge. If you're focused on eye development, for example, one of the signaling systems used there may also be involved in the kidney, and it can be easy to miss what people are discovering about it there. As the people at FutureHouse put this issue, "By focusing on 'combinatorial synthesis' (identifying non-obvious connections between disparate fields), Robin effectively targets 'low-hanging fruit' that human experts may overlook due to the compartmentalization of scientific knowledge." This is a task that's well suited to AI, which can chew through the peer-reviewed literature in the background while researchers do other things. This isn't really a question of whether an AI could do something better or worse than a human; it's more of an issue of whether any human would end up doing these sorts of searches at all. By finding enough connections among disparate research, these tools can make suggestions -- hypotheses, really -- about the biology. This can include things like what processes underly biological behaviors, and what pathways and networks regulate those processes. And, in the cases explored here, it included suggesting known drugs that might target some of these pathways in diseased cells: acute myeloid leukemia in Google's case, and a form of macular degeneration for FutureHouse. Co-scientist As you might imagine, Google's system is based on the company's Gemini large language model. That helps the system interpret a statement of research goals provided by human scientists and starts a literature search to find relevant information and form hypotheses. Those are then evaluated relative to each other in a "tournament," the results of which are evaluated by a Reflection agent. An Evolution agent can then make improvements to any surviving ideas, which can be sent back through the process. Key criteria considered throughout this process include plausibility, novelty, testability, and safety. And the Reflection tool has access to external search tools, as access to the scientific literature "prevented the hallucination of seemingly novel but implausible hypotheses," the company wrote. As the paper puts it, scientists were kept in the loop at all times. In the search for potential drugs targeting leukemia, the suggestions made by the system were prioritized based on a review by a panel of experts, who had access to the literature Co-Scientist used to formulate its suggestions. The results are what you'd expect from cancer therapies. Some of the drugs identified were affected, but only against subsets of a panel of myeloid leukemia cells. That's not unusual, given that there are multiple routes to unchecked growth, so drugs that block the route followed by one cell type may not be effective in cells that took a different route. Google also mentioned that the system could do more general hypothesizing that doesn't involve drugs, using an example of the spread of virulence genes in bacteria. But the details of that work were fairly sparse. The system is also set up so that it's model agnostic, allowing it to be switched over to better-performing models as AI systems evolve. But they also warn that, "Co-Scientist also inherits the intrinsic limitations of its underlying models, including imperfect factuality and the potential for hallucinations." And Robin FutureHouse's system has some similarities but a couple of critical differences that go beyond naming all the agentic tools after birds. The main system, Robin, has access to specialized literature search tools. One, Crow, produces a concise summary of papers, while Falcon gives a deep overview of the information contained in the paper. The paper describing the system provides a clear sense of the advantages here: "Robin analyses 551 papers in 30 minutes compared to an estimated time of 540 hours for a human." Taking those summaries, Robin then formed a series of hypotheses about disease mechanisms for macular degeneration and used these tools to provide a detailed report on the evidence for each mechanism. An LLM judge then made pairwise comparisons among the hypotheses, which resulted in relative rankings -- a bit like Google's tournament system. In a similar manner, the system was re-deployed to suggest cell lines and culture conditions that could provide a model of macular degeneration, and it prepared reports on 30 candidate drugs. "These reports contained both justification for why each drug is suitable for mitigating the disease mechanism represented in the in vitro model and potential limitations the drug may pose," according to the FutureHouse team. Again, these reports were evaluated by human experts to determine which tests to go ahead with. Robin also suggested assays to test the drugs, which humans evaluated (in most cases, it appears they used variants of the suggested ones). The key difference with Robin is that it includes a tool, Finch, that can automate the evaluation of data from some standard biological screening assays, like flow cytometry and RNA-seq. So, as long as your tests involve one of the assays that Finch can handle, then there's an additional step that can be performed by the system. As above, Robin came up with a novel hypothesis: Increasing the ability of retinal cells to pick up debris outside the cells could provide some protection against the disease. And it identified a drug that seemed to provide just that sort of boost in the experiments it proposed. As Google found, having tools designed specifically to interface with the scientific literature mattered. Swapping out Crow for OpenAI's o4-mini took the rate of hallucinated references from zero percent all the way up to 45 percent. FutureHouse also took a look at the performance of OpenAI's research-focused tool and found that, in all cases where it suggested drugs that Robin hadn't come up with, those drugs failed to have an effect on these cells. Where does this leave us? For starters, it's important to note that these successes come in one of the easier parts of drug development (not that any part of it can really be said to be easy). The AIs weren't being asked to design entirely new molecules, and most drugs fail during the animal and clinical trials phase, rather than during testing in cell culture. That's not to say repurposing existing drugs is nothing -- we already have safety profiles and agency approvals for these molecules, and many are off-patent and therefore cheap. But we're not at the point where AIs are solving hard problems. This sort of hypothesis -- this mechanism underlies that disease, and the drug over there can target it -- is also one of the more concrete forms of hypothesis in biology. In my career as a scientist, I had to come up with hypotheses that were meant to address things like "mice with this mutation have a whole lot of defects in very different tissues; is there a single mechanism underlying them?" Or, "What's going on at the border of this gene's expression that is changing how cells respond to this signaling molecule?" It's not clear how these systems could handle these more open-ended scientific problems. That said, the problem of literature overload is a real one in many fields, and systems meant to address it can potentially help us avoid a situation where all the information we needed was sitting around for a decade, but nobody put it together. Given we're still working through AI's growing pains, however, I'm also happy that there are at least two independently developed systems tackling this problem so that we can potentially run both and compare the results. Nature, 2026. DOI: 10.1038/s41586-026-10652-y, /10.1038/s41586-026-10644-y (About DOIs).
Share
Copy Link
Two new AI systems published in Nature use teams of AI agents to develop hypotheses and analyze data for scientific research. Google's Co-Scientist and FutureHouse's Robin identified promising drug candidates for acute myeloid leukemia and macular degeneration in hours, completing tasks that typically take human researchers months. While human supervision remains essential, these AI-based science assistants demonstrate how teams of AI agents can accelerate scientific research by synthesizing vast amounts of literature.
Two groundbreaking systems described in Nature are reshaping how scientific research unfolds, using teams of AI agents to develop hypotheses, propose experiments, and analyze data at unprecedented speeds. Google's Co-Scientist and FutureHouse's Robin both demonstrated the ability to identify promising drug candidates in mere hours—tasks that would typically consume months of human effort
1
. These AI-based science assistants represent a shift in laboratory workflows, though human supervision remains central to their operation.Vivek Natarajan, a researcher at Google DeepMind who helped develop Co-Scientist, describes the system as "an agentic, in silico implementation of the thought process in a scientist's head." The goal, he explains, is to "give scientists superpowers"
1
. Both systems tackle a pressing challenge: the explosion of scientific literature has made it nearly impossible for researchers to stay current even within their own fields, let alone identify relevant connections across disciplines.
Source: Nature
Built on Google's Gemini model, Co-Scientist operates as what researchers call "scientist in the loop," keeping human researchers engaged at critical decision points
2
. The system interprets research goals provided by scientists and launches literature searches to generate hypotheses. These hypotheses then compete in a "tournament" evaluated by a Reflection agent, while an Evolution agent refines surviving ideas through iterative cycles.In drug discovery experiments targeting acute myeloid leukemia, Co-Scientist identified a list of drug candidates from which human researchers selected five for further study. Three showed promise in preliminary tests on lab-grown cells
1
. The system evaluates suggestions based on plausibility, novelty, testability, and safety throughout the process. Access to scientific literature proved crucial—it "prevented the hallucination of seemingly novel but implausible hypotheses," according to the research team2
.In another experiment, Co-Scientist developed a hypothesis explaining why certain antimicrobial resistance genes appear across multiple bacterial species. The system reached the same conclusion in days that a research group had spent considerably longer studying—results they had not yet published
1
. About 100 scientists outside Google DeepMind now have access to test its capabilities across various research settings.Developed by FutureHouse, a non-profit AI research lab in San Francisco, Robin takes the agentic approach further by incorporating specialized analysis capabilities. The system was instructed to find treatments for dry age-related macular degeneration, beginning with AI agents trained to conduct literature reviews
1
. Robin used these reports to select lab experiments testing various drug candidates, with humans conducting the physical experiments and feeding data back to the system.An AI agent specialized in analyzing data then processed the experimental results. Through this procedure, Robin identified ripasudil—a drug approved for treating glaucoma—as a candidate treatment for macular degeneration. The system suggested assays to confirm ripasudil's activity and proposed follow-up experiments
1
. FutureHouse researchers emphasize that Robin targets "low-hanging fruit" that human experts might overlook due to knowledge compartmentalization, focusing on "combinatorial synthesis" to identify non-obvious connections between disparate fields2
.Related Stories
While these systems accelerate scientific research dramatically, significant caveats remain. None of the drug candidates identified have been fully evaluated, and many compounds that pass initial assays in lab-grown cells fail more stringent testing
1
. Both systems rely on large language models prone to AI hallucinations—false but plausible-sounding answers that could lead researchers down costly dead ends.Ola Spjuth, who studies AI for drug discovery at Uppsala University, notes that hallucinations will likely remain a concern with this form of AI. However, cutting-edge models hallucinate less than predecessors, and researchers can audit decision-making processes to understand the reasoning behind suggestions
1
. Both Robin and Co-Scientist include steps where AI agents debate hypotheses or compare results among themselves, potentially filtering out faulty reasoning."We cannot just delegate important decisions right now to LLMs and AI agents," Spjuth emphasizes. "We need to supervise these methods"
1
. Karandeep Singh, who oversees AI initiatives for University of California San Diego Health, adds that real-world performance across diverse contexts remains to be seen: "You don't know how it works in reality until it's been made available to a broad set of people"1
.The question isn't whether AI can perform certain tasks better than humans, but whether humans would realistically conduct these exhaustive literature searches at all. By chewing through massive amounts of information in the background, these systems augment scientists' capabilities rather than replace them. The role of human researchers is shifting—companies are advancing sophisticated robots for lab work, while Google researchers reported another agentic AI system called Empirical Research Assistance that writes high-quality software for fields from cosmology to neuroscience
1
.
Source: Ars Technica
Samuel Rodriques, chief executive and co-founder of FutureHouse, suggests that AI's ability to handle hypothesis generation and data interpretation may vary by research type. For drug discovery specifically, "there's a huge way to go" before AI can design entirely new therapeutic applications
1
. Google's system is model-agnostic, allowing it to switch to better-performing models as AI evolves, though it "inherits the intrinsic limitations of its underlying models, including imperfect factuality and the potential for hallucinations"2
.As these AI-based science assistants move from proof-of-concept to broader deployment, researchers will be watching closely to see whether the speed gains translate across different scientific domains and whether human oversight can effectively catch AI errors before they derail expensive research programs.
Summarized by
Navi
30 Jul 2025•Science and Research

06 Mar 2025•Science and Research

20 Feb 2025•Science and Research
