2 Sources
2 Sources
[1]
This AI has chemical expertise -- and helps synthesize 35 new drugs and materials
Searching for blockbuster drugs and wonder materials is an arduous task for chemists. To make their promising compounds, they must trawl through millions of known chemical reactions, with hundreds of thousands more added annually, and then test whether it is possible to synthesize them. Now, researchers have created an artificial-intelligence system that vastly simplifies and accelerates the process of chemical synthesis. The system, which is called MOSAIC and is described in a study published in Nature on 19 January, recommended conditions that researchers were able to use to generate 35 compounds with the potential to become products like pharmaceuticals, agrochemicals or cosmetics without needing to do any further trawling or tweaking. "The synthesis of small molecules is the slow step in drug discovery and a number of other important areas," says study co-author Timothy Newhouse, a chemist at Yale University in New Haven, Connecticut. MOSAIC could remove this bottleneck, adds Newhouse, so could lead to more and better products. It is "capable of drafting complete laboratory instructions -- detailed enough for chemists to follow -- to help create molecules that have not previously existed". Predicting the conditions of chemical reactions has been a key focus of AI use in chemistry. One of the most prominent tools is IBM's RXN for Chemistry, which is based on a large language model (LLM). It uses a system called simplified molecular-input line-entry system (SMILES). This translates chemical 3D structures into letters, numbers and punctuation, which are better suited to a system that recognizes language. By contrast, LLMs such as ChemCrow are trained for chemistry tasks using natural-language data. The SMILES approach makes it easier to process chemical information such as starting materials and solvents. "Our goal was to build a general model that could read chemistry the way chemists write it [by] listening to the language of experimental procedures and quickly turning that collective voice into a practical suggestion," says Newhouse. Newhouse adds that integrating the step-by-step instructions that MOSAIC produces into automated systems would be a "natural next step". The researchers used an AI system they had developed previously to cluster a database of around one million reactions extracted from patents into 2,285 subsets. Using the subsets, the team trained Meta's partially open-source Llama LLM to create 2,498 separate expert models, each specialized in one combination of chemical transformation starting from one type of molecule. This approach can run on computers locally because it uses fewer parameters than do the major LLMs. Martin Seifrid, a materials scientist at North Carolina State University in Raleigh, says that MOSAIC is notable in that it avoids "throwing the largest possible model at a problem, instead choosing to focus on a carefully designed system of much smaller 'expert' models". "Each specialized model is more accurate within its domain," Seifrid says. The researchers used MOSAIC to suggest conditions to make 52 new substances. Testing the methods in the lab, the researchers could successfully produce 35 of them. MOSAIC also accurately predicted the colour and form of the compounds. MOSAIC also suggested conditions for reaction methodologies that are absent from the millions of reactions used in expert's training, suggesting an entirely new way to make molecules called azaindoles, which was successful when tested. The Yale team worked on MOSAIC with researchers at the multinational drug company Boehringer Ingelheim's Connecticut site, who are already using the system. "They are interested in designing new synthetic pathways," says Victor Batista, a theoretical and computational chemist also at Yale, and a co-author of the study. "If they reduce the number of steps, they save a lot of money." MOSAIC is available as open-source code for other groups to use, he adds. The use of an expert framework is an "important conceptual advance in AI-assisted chemistry", says Kuangbiao Liao, a chemist at Guangzhou National Laboratory in China. "It moves AI from prediction to action, by reframing reaction-condition selection as a decision-making problem rather than a single-output prediction task," he says. The framework "preserves competing chemical objectives instead of collapsing them into one averaged model, which better reflects how chemists actually reason at the bench". Xenofon Evangelopoulos, a computer scientist at the University of Liverpool, UK, agrees that the approach has much broader potential. "Beyond its practical utility as a robust AI tool for chemical synthesis, MOSAIC establishes a scalable paradigm for harnessing global chemical knowledge through modular specialization," he says.
[2]
Scientists develop AI that designs molecules ten times faster
Instead of waiting years for lucky breaks, researchers are turning to machines to sketch out entirely new potential chemical molecules on demand. That change matters now because faster, more directed discovery can cut waste, lower costs, and move useful medicines and materials toward the real world sooner. At New York University (NYU) and the University of Florida (UF), teams built a property-first generator. The work was led by Stefano Martiniani, Ph.D., and his group; they study how chemical structure shapes physical behavior. His group teamed up with chemists and model builders to turn target traits into candidate molecular blueprints. The setup lets the group work backward from a goal to a structure, instead of tweaking whatever chemistry already exists. Their new artificial intelligence (AI) system, PropMolFlow, generated candidate molecules about ten times faster than many earlier tools. PropMolFlow started from random noise and then refined it, step by step, until atoms and bonds formed a stable pattern. The team needed about 100 computational steps to reach a valid structure, where other approaches often need around 1,000. Fewer steps cut waiting time on a computer, which can make early screening cycles move much more quickly. Speed only helps if the output molecule makes chemical sense, because labs cannot synthesize nonsense structures. In past systems, programs sometimes drew bonds that violated basic bonding rules, a problem the team flagged in a report. "This matters because many earlier approaches produced structures that looked superficially plausible but violated basic chemical rules," said Martiniani. PropMolFlow cleared that hurdle by producing structures with correct bonding patterns and realistic shapes more than 90 percent of the time. Even a chemically valid molecule can miss the property goal, so AI evaluation becomes a make-or-break step. The risk grows when one neural network proposes molecules and a second neural network estimates properties using similar training habits. "If a neural network generates a molecule and another neural network predicts its properties, both systems may share similar blind spots because they are drawing from the same reservoir of information; AI is then grading its own homework," observed Martiniani. To avoid shared blind spots, the team used density functional theory, a quantum method that calculates properties from electrons. To train and test PropMolFlow, the team relied on the QM9 benchmark dataset of small molecules and quantum properties. Because each molecule comes with a standardized structure and labels, the model could learn links between shapes and target traits. The dataset covered only lightweight chemistry, so the model mostly saw compounds with a few dozen atoms and familiar elements. The narrow training set kept early tests clean, but it left open questions about how well larger, drug-like molecules would behave. Beyond average cases, the team pushed the model toward out-of-distribution targets far outside its usual training range. They asked for underrepresented property levels, then generated many molecules and filtered them through the same chemistry checks. For some properties, physics-based calculations showed the generated set centered close to the target, even beyond typical examples. Where training data stayed thin, the model drifted, showing that algorithms still need experience to reliably chase extremes. Fast generation matters most when researchers run many design rounds, because each round changes what they test next. PropMolFlow lets a lab generate and screen a large batch, then hand the best few to experiments. "With the ability to generate thousands of chemically valid, property-targeted candidates in minutes rather than hours, researchers can iterate faster: generate candidates, filter computationally, validate the best ones with physics or experiments, and feed results back to improve the next round," Martiniani explained. That loop can shorten the time between an idea and a test tube, although chemists still choose which ideas deserve effort. Real medicines and high-performance materials often rely on larger molecules, which carry many moving parts and more failure modes. Bigger structures raise the odds of awkward shapes, unstable charges, or reactions that make a compound hard to synthesize. Researchers still need to adapt these models for bigger systems, because new atoms and bonds create far more ways to go wrong. Speed alone will not deliver a finished pill, but it can push more ideas toward serious testing. Evaluating molecule generators has become its own science, because a flattering score can hide dangerous or impossible structures. In their analysis, the authors flagged open-shell molecules, species with unpaired electrons that often behave unpredictably in calculations. Teams at UF and NYU also repaired records with inconsistent bond and charge labels, then released 10,773 molecules with density functional theory checks. Better yardsticks reduce false confidence, and they help teams compare models without quietly accepting mistakes as normal output. PropMolFlow showed that speed, chemical sense, and property targeting can move together when design starts from goals. As teams aim at larger molecules and tougher properties, they will still need strict checks and real experiments to confirm value. Like what you read? Subscribe to our newsletter for engaging articles, exclusive content, and the latest updates.
Share
Share
Copy Link
Researchers unveiled two AI systems that transform chemical synthesis and molecule design. Yale's MOSAIC successfully helped synthesize 35 new compounds for pharmaceuticals and materials, while NYU's PropMolFlow generates molecular candidates ten times faster than previous tools. Both systems address critical bottlenecks in drug discovery and could accelerate the path from laboratory concept to real-world products.
Two groundbreaking AI systems are reshaping how scientists approach chemical synthesis and molecule design, potentially cutting years from the drug discovery process. Yale University researchers developed MOSAIC, an AI system for chemical synthesis that successfully helped create 35 new compounds with potential applications in pharmaceuticals, agrochemicals, and cosmetics
1
. Meanwhile, teams at New York University and the University of Florida built PropMolFlow, which generates new compounds approximately ten times faster than earlier tools2
.
Source: Earth.com
"The synthesis of small molecules is the slow step in drug discovery and a number of other important areas," says Timothy Newhouse, a chemist at Yale University and study co-author
1
. MOSAIC addresses this bottleneck by drafting complete laboratory instructions detailed enough for chemists to follow when creating molecules that have not previously existed.The MOSAIC system represents a conceptual shift in how AI in chemistry operates. Rather than deploying a single massive large language model, researchers trained Meta's partially open-source Llama LLM to create 2,498 separate expert models
1
. Each specialized model focuses on one combination of chemical transformation starting from a specific molecule type, drawn from 2,285 subsets clustered from a database of around one million reactions extracted from patents.Martin Seifrid, a materials scientist at North Carolina State University, notes that MOSAIC avoids "throwing the largest possible model at a problem, instead choosing to focus on a carefully designed system of much smaller 'expert' models." Each specialized model proves more accurate within its domain
1
. This modular approach can run on local computers because it uses fewer parameters than major large language model systems.When researchers tested MOSAIC's recommendations in the laboratory, they successfully produced 35 out of 52 suggested new substances. The system also accurately predicted the color and form of the compounds. More remarkably, MOSAIC suggested reaction methodologies absent from the millions of reactions used in training, proposing an entirely new way to make azaindoles that proved successful when tested
1
.
Source: Nature
While MOSAIC optimizes existing chemical knowledge, PropMolFlow tackles molecule design from a different angle. Led by Stefano Martiniani at New York University, the system works backward from target properties to molecular structure rather than tweaking existing chemistry
2
. This property-first approach lets researchers specify desired traits and receive candidate molecular blueprints on demand.PropMolFlow achieves its speed advantage through computational efficiency. The system starts from random noise and refines it step by step until atoms and bonds form a stable pattern, requiring about 100 computational steps to reach a valid structure where other approaches often need around 1,000
2
. Fewer steps translate directly to reduced waiting time, enabling early screening cycles to move much more quickly.Speed means little if generated molecules violate basic chemistry rules or cannot be synthesized in actual laboratories. PropMolFlow addresses this fundamental challenge by producing structures with correct bonding patterns and realistic shapes more than 90 percent of the time
2
. "This matters because many earlier approaches produced structures that looked superficially plausible but violated basic chemical rules," Martiniani explains.The team also tackled another critical issue in AI-assisted chemistry: the risk of neural networks grading their own work. "If a neural network generates a molecule and another neural network predicts its properties, both systems may share similar blind spots because they are drawing from the same reservoir of information; AI is then grading its own homework," Martiniani observes
2
. To avoid this trap, the team used density functional theory, a quantum method that calculates properties from electrons rather than relying on another neural network.Related Stories
The Yale team developed MOSAIC in collaboration with researchers at Boehringer Ingelheim's Connecticut site, who are already using the system. "They are interested in designing new synthetic pathways," says Victor Batista, a theoretical and computational chemist at Yale University and study co-author. "If they reduce the number of steps, they save a lot of money." MOSAIC is available as open-source code for other groups to use
1
.Kuangbiao Liao, a chemist at Guangzhou National Laboratory, calls the expert framework "an important conceptual advance in AI-assisted chemistry" that moves AI from prediction to action. The framework "preserves competing chemical objectives instead of collapsing them into one averaged model, which better reflects how chemists actually reason at the bench"
1
.These systems enable iterative design cycles that could fundamentally change how laboratories operate. "With the ability to generate thousands of chemically valid, property-targeted candidates in minutes rather than hours, researchers can iterate faster: generate candidates, filter computationally, validate the best ones with physics or experiments, and feed results back to improve the next round," Martiniani explains
2
.Newhouse suggests that integrating MOSAIC's step-by-step instructions into automated systems would be a "natural next step"
1
. This integration could create fully automated pipelines from computational chemistry design to physical synthesis, though chemists will still determine which candidates deserve experimental validation.Both systems currently face limitations when scaling to larger, more complex molecules typical of modern pharmaceuticals. Bigger structures introduce more failure modes, awkward shapes, unstable charges, and synthesis challenges. Researchers acknowledge the need to adapt these models for larger molecular systems where new atoms and bonds create far more ways for designs to fail
2
. However, by cutting computational time and improving hit rates in early screening, these AI systems promise to push more promising ideas toward serious testing, potentially shortening the timeline from laboratory concept to market-ready product.Summarized by
Navi
07 Nov 2024•Science and Research

07 Jan 2025•Science and Research

24 Oct 2024•Science and Research

1
Technology

2
Policy and Regulation

3
Technology
