2 Sources
2 Sources
[1]
A new generative AI approach to predicting chemical reactions
Caption: The FlowER (Flow matching for Electron Redistribution) system allows a researcher to explicitly keep track of all the electrons in a reaction to ensure that none are spuriously added or deleted in the process of predicting the outcome of a chemical reaction. Many attempts have been made to harness the power of new artificial intelligence and large language models (LLMs) to try to predict the outcomes of new chemical reactions. These have had limited success, in part because until now they have not been grounded in an understanding of fundamental physical principles, such as the laws of conservation of mass. Now, a team of researchers at MIT has come up with a way of incorporating these physical constraints on a reaction prediction model, and thus greatly improving the accuracy and reliability of its outputs. The new work was reported Aug. 20 in the journal Nature, in a paper by recent postdoc Joonyoung Joung (now an assistant professor at Kookmin University, South Korea); former software engineer Mun Hong Fong (now at Duke University); chemical engineering graduate student Nicholas Casetti; postdoc Jordan Liles; physics undergraduate student Ne Dassanayake; and senior author Connor Coley, who is the Class of 1957 Career Development Professor in the MIT departments of Chemical Engineering and Electrical Engineering and Computer Science. "The prediction of reaction outcomes is a very important task," Joung explains. For example, if you want to make a new drug, "you need to know how to make it. So, this requires us to know what product is likely" to result from a given set of chemical inputs to a reaction. But most previous efforts to carry out such predictions look only at a set of inputs and a set of outputs, without looking at the intermediate steps or considering the constraints of ensuring that no mass is gained or lost in the process, which is not possible in actual reactions. Joung points out that while large language models such as ChatGPT have been very successful in many areas of research, these models do not provide a way to limit their outputs to physically realistic possibilities, such as by requiring them to adhere to conservation of mass. These models use computational "tokens," which in this case represent individual atoms, but "if you don't conserve the tokens, the LLM model starts to make new atoms, or deletes atoms in the reaction." Instead of being grounded in real scientific understanding, "this is kind of like alchemy," he says. While many attempts at reaction prediction only look at the final products, "we want to track all the chemicals, and how the chemicals are transformed" throughout the reaction process from start to end, he says. In order to address the problem, the team made use of a method developed back in the 1970s by chemist Ivar Ugi, which uses a bond-electron matrix to represent the electrons in a reaction. They used this system as the basis for their new program, called FlowER (Flow matching for Electron Redistribution), which allows them to explicitly keep track of all the electrons in the reaction to ensure that none are spuriously added or deleted in the process. The system uses a matrix to represent the electrons in a reaction, and uses nonzero values to represent bonds or lone electron pairs and zeros to represent a lack thereof. "That helps us to conserve both atoms and electrons at the same time," says Fong. This representation, he says, was one of the key elements to including mass conservation in their prediction system. The system they developed is still at an early stage, Coley says. "The system as it stands is a demonstration -- a proof of concept that this generative approach of flow matching is very well suited to the task of chemical reaction prediction." While the team is excited about this promising approach, he says, "we're aware that it does have specific limitations as far as the breadth of different chemistries that it's seen." Although the model was trained using data on more than a million chemical reactions, obtained from a U.S. Patent Office database, those data do not include certain metals and some kinds of catalytic reactions, he says. "We're incredibly excited about the fact that we can get such reliable predictions of chemical mechanisms" from the existing system, he says. "It conserves mass, it conserves electrons, but we certainly acknowledge that there's a lot more expansion and robustness to work on in the coming years as well." But even in its present form, which is being made freely available through the online platform GitHub, "we think it will make accurate predictions and be helpful as a tool for assessing reactivity and mapping out reaction pathways," Coley says. "If we're looking toward the future of really advancing the state of the art of mechanistic understanding and helping to invent new reactions, we're not quite there. But we hope this will be a steppingstone toward that." "It's all open source," says Fong. "The models, the data, all of them are up there," including a previous dataset developed by Joung that exhaustively lists the mechanistic steps of known reactions. "I think we are one of the pioneering groups making this dataset, and making it available open-source, and making this usable for everyone," he says. The FlowER model matches or outperforms existing approaches in finding standard mechanistic pathways, the team says, and makes it possible to generalize to previously unseen reaction types. They say the model could potentially be relevant for predicting reactions for medicinal chemistry, materials discovery, combustion, atmospheric chemistry, and electrochemical systems. In their comparisons with existing reaction prediction systems, Coley says, "using the architecture choices that we've made, we get this massive increase in validity and conservation, and we get a matching or a little bit better accuracy in terms of performance." He adds that "what's unique about our approach is that while we are using these textbook understandings of mechanisms to generate this dataset, we're anchoring the reactants and products of the overall reaction in experimentally validated data from the patent literature." They are inferring the underlying mechanisms, he says, rather than just making them up. "We're imputing them from experimental data, and that's not something that has been done and shared at this kind of scale before." The next step, he says, is "we are quite interested in expanding the model's understanding of metals and catalytic cycles. We've just scratched the surface in this first paper," and most of the reactions included so far don't include metals or catalysts, "so that's a direction we're quite interested in." In the long term, he says, "a lot of the excitement is in using this kind of system to help discover new complex reactions and help elucidate new mechanisms. I think that the long-term potential impact is big, but this is of course just a first step." The work was supported by the Machine Learning for Pharmaceutical Discovery and Synthesis consortium and the National Science Foundation.
[2]
A new generative AI approach to predicting chemical reactions improves accuracy and reliability
Many attempts have been made to harness the power of new artificial intelligence and large language models (LLMs) to try to predict the outcomes of new chemical reactions. These have had limited success, in part because until now they have not been grounded in an understanding of fundamental physical principles, such as the laws of conservation of mass. Now, a team of researchers at MIT has come up with a way of incorporating these physical constraints into a reaction prediction model, and thus greatly improving the accuracy and reliability of its outputs. The new work is reported in the journal Nature, in a paper by recent postdoc Joonyoung Joung (now an assistant professor at Kookmin University, South Korea); former software engineer Mun Hong Fong (now at Duke University); chemical engineering graduate student Nicholas Casetti; postdoc Jordan Liles; physics undergraduate student Ne Dassanayake; and senior author Connor Coley, who is the Class of 1957 Career Development Professor in the MIT departments of Chemical Engineering and Electrical Engineering and Computer Science. "The prediction of reaction outcomes is a very important task," Joung explains. For example, he says, if you want to make a new drug, "You need to know how to make it. So, this requires us to know what product is likely to result from a given set of chemical inputs to a reaction." But most previous efforts to carry out such predictions look only at a set of inputs and a set of outputs, without looking at the intermediate steps or considering the constraints of ensuring that no mass is gained or lost in the process, which is not possible in actual reactions. Joung points out that while large language models such as ChatGPT have been very successful in many areas of research, these models do not provide a way to limit their outputs to physically realistic possibilities, such as by requiring them to adhere to conservation of mass. These models use computational "tokens," which in this case represent individual atoms. However, he says, "If you don't conserve the tokens, the LLM model starts to make new atoms, or deletes atoms in the reaction." Instead of being grounded in real scientific understanding, "this is kind of like alchemy," he adds. While many attempts at reaction prediction only look at the final products, "We want to track all the chemicals, and how the chemicals are transformed" throughout the reaction process from start to end, he says. In order to address the problem, the team made use of a method developed back in the 1970s by chemist Ivar Ugi, which uses a bond-electron matrix to represent the electrons in a reaction. They used this system as the basis for their new program, called FlowER (Flow matching for Electron Redistribution), which allows them to explicitly keep track of all the electrons in the reaction to ensure that none are spuriously added or deleted in the process. The system uses a matrix to represent the electrons in a reaction, and uses nonzero values to represent bonds or lone electron pairs and zeros to represent a lack thereof. "That helps us to conserve both atoms and electrons at the same time," says Fong. This representation, he says, was one of the key elements to including mass conservation in their prediction system. The system they developed is still at an early stage, Coley says. "The system as it stands is a demonstration -- a proof of concept that this generative approach of flow matching is very well suited to the task of chemical reaction prediction." While the team is excited about this promising approach, he says, "we're aware that it does have specific limitations as far as the breadth of different chemistries that it's seen." Although the model was trained using data on more than a million chemical reactions, obtained from a U.S. Patent Office database, those data do not include certain metals and some kinds of catalytic reactions, he says. "We're incredibly excited about the fact that we can get such reliable predictions of chemical mechanisms" from the existing system, he says. "It conserves mass, it conserves electrons, but we certainly acknowledge that there's a lot more expansion and robustness to work on in the coming years as well." But even in its present form, which is being made freely available through the online platform GitHub, "we think it will make accurate predictions and be helpful as a tool for assessing reactivity and mapping out reaction pathways," Coley says. "If we're looking toward the future of really advancing the state of the art of mechanistic understanding and helping to invent new reactions, we're not quite there. But we hope this will be a stepping stone toward that." "It's all open source," says Fong. "The models, the data, all of them are up there," including a previous dataset developed by Joung that exhaustively lists the mechanistic steps of known reactions. "I think we are one of the pioneering groups making this dataset, and making it available open-source, and making this usable for everyone," he says. The FlowER model matches or outperforms existing approaches in finding standard mechanistic pathways, the team says, and makes it possible to generalize to previously unseen reaction types. They say the model could potentially be relevant for predicting reactions for medicinal chemistry, materials discovery, combustion, atmospheric chemistry, and electrochemical systems. In their comparisons with existing reaction prediction systems, Coley says, "Using the architecture choices that we've made, we get this massive increase in validity and conservation, and we get a matching or a little bit better accuracy in terms of performance." He adds, "What's unique about our approach is that while we are using these textbook understandings of mechanisms to generate this dataset, we're anchoring the reactants and products of the overall reaction in experimentally validated data from the patent literature." They are inferring the underlying mechanisms, he says, rather than just making them up. "We're imputing them from experimental data, and that's not something that has been done and shared at this kind of scale before." Speaking about the next step, he says, "We are quite interested in expanding the model's understanding of metals and catalytic cycles. We've just scratched the surface in this first paper," and most of the reactions included so far don't include metals or catalysts, "so that's a direction we're quite interested in." In the long term, he says, "A lot of the excitement is in using this kind of system to help discover new complex reactions and help elucidate new mechanisms. I think that the long-term potential impact is big, but this is, of course, just a first step."
Share
Share
Copy Link
MIT scientists have created FlowER, a new generative AI system that accurately predicts chemical reactions by incorporating fundamental physical principles, potentially revolutionizing drug discovery and materials science.
Researchers at the Massachusetts Institute of Technology (MIT) have developed a groundbreaking generative AI approach to predicting chemical reactions, potentially revolutionizing fields such as drug discovery and materials science. The new system, named FlowER (Flow matching for Electron Redistribution), addresses key limitations of existing AI models by incorporating fundamental physical principles
1
.Source: Massachusetts Institute of Technology
Previous attempts to use artificial intelligence and large language models (LLMs) for predicting chemical reactions have faced significant challenges. These models often failed to account for fundamental physical laws, such as the conservation of mass, leading to unrealistic predictions. As Joonyoung Joung, a key researcher on the project, explains, "If you don't conserve the tokens, the LLM model starts to make new atoms, or deletes atoms in the reaction. This is kind of like alchemy"
2
.The MIT team's solution, FlowER, builds upon a method developed in the 1970s by chemist Ivar Ugi. The system uses a bond-electron matrix to represent and track all electrons in a reaction, ensuring that no mass is spuriously added or deleted during the prediction process. This approach allows the model to conserve both atoms and electrons simultaneously, grounding the predictions in real scientific understanding
1
.Related Stories
FlowER has demonstrated remarkable capabilities, matching or outperforming existing approaches in finding standard mechanistic pathways. The system's ability to generalize to previously unseen reaction types opens up exciting possibilities for various fields, including:
Connor Coley, the senior author of the study, expressed enthusiasm about the system's reliable predictions of chemical mechanisms, noting that "It conserves mass, it conserves electrons, but we certainly acknowledge that there's a lot more expansion and robustness to work on in the coming years as well"
2
.In a move that could accelerate progress in the field, the MIT team has made FlowER freely available through GitHub. This open-source approach includes not only the models but also the extensive dataset of known reaction mechanisms developed by the team
1
.While the current version of FlowER is described as a proof of concept, it represents a significant step forward in the quest for accurate chemical reaction predictions. The researchers acknowledge that there is still work to be done, particularly in expanding the system's knowledge of different chemistries and improving its robustness
2
.As the scientific community continues to build upon this foundation, FlowER and similar AI-driven approaches have the potential to dramatically accelerate drug discovery, materials development, and our understanding of complex chemical processes.
Summarized by
Navi
[1]
20 Dec 2024•Science and Research
25 Jul 2025•Science and Research
10 Apr 2025•Science and Research