2 Sources
[1]
New artificial intelligence model maps how genes work together inside cells
Mount Sinai Health SystemMay 21 2026 Scientists at the Icahn School of Medicine at Mount Sinai have created a new artificial intelligence (AI) model that helps reveal how genes function together inside human cells, offering a powerful new way to understand biology and disease. The study, published in the May 21 online issue of Patterns, a Cell Press Journal [https://doi.org/10.1016/j.patter.2026.101565], introduces a gene set foundation model (GSFM) designed to learn patterns in how genes are grouped and function across thousands of biological contexts. The work draws inspiration from advances in large language models (LLMs) such as ChatGPT, which learn how words gain meaning depending on their context. In a similar way, a GSFM learns how genes behave differently depending on their cellular "context." Genes rarely act alone. Instead, they participate in multiple biological processes, forming different molecular groupings depending on where and when they are active in the cell. A single gene can play different roles in different settings, much like a word can have different meanings in different sentences. Just as modern language models learn the meaning of words from context, we asked whether AI could learn the 'meaning' of genes in the same way. Our GSFM was designed to do exactly that." Avi Ma'ayan, PhD, senior corresponding author, Professor of Pharmacological Sciences and Director of the Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai The model provides a new way to understand the structural and functional organization of genes and their products inside human cells. This improved understanding could eventually support the development of better diagnostics, biomarkers, and therapies. By mapping how genes relate to one another across many biological situations, the GSFM creates a reference framework that can help scientists interpret complex multi-omics datasets more effectively, say the investigators. "The organization of genes within cells remains one of the major unsolved questions in biology. The GSFM helps address this by learning from millions of gene groupings derived from published research and gene expression datasets," says Dr. Ma'ayan. The model can: Help identify the function of poorly understood genes without immediate laboratory experiments Highlight genes involved in disease processes Suggest potential new drug targets and biomarkers Provide a reusable knowledge system for many types of biomedical research data analysis tasks-for example, improved gene set enrichment analysis In essence, say the investigators, GSFM offers a new "map" of how genes work together in different contexts. To build the model, the researchers compiled millions of gene sets from published scientific studies and gene expression datasets. In total, the system learned from hundreds of thousands of independent research efforts. The AI model was trained in a way similar to solving a puzzle: it was given part of a gene set and asked to predict the missing pieces. Over time, it learned underlying patterns that describe how genes are grouped and interact. The AI model was then benchmarked against other approaches and demonstrated strong performance, including the ability to identify gene-gene and gene-function relationships before they were confirmed experimentally. To evaluate this, the model was trained using gene sets from publications up to a defined cutoff date, and then tested on whether it could predict discoveries reported in studies published after that cutoff date. "Unlike previous biological AI models that primarily rely on gene expression data, our GSFM is uniquely trained on gene sets, a different and largely underused type of biological information," says Dr. Ma'ayan. "This approach allows the model to integrate diverse data from many diseases, experimental methods, and research conditions, creating a unified representation of gene relationships across biology." GSFMs could enhance existing bioinformatics tools and improve the interpretation of data collected with omics technologies. One immediate application is in gene set enrichment analysis, a widely used method in molecular biology research. By improving how scientists interpret gene groupings, the model may help uncover new biological insights from both existing and future datasets. The research team plans to expand the system by combining GSFM with other AI foundation models. One goal is to integrate it with language-based models to generate natural-language explanations of gene functions. Another future direction is combining GSFM with drug-focused AI models, with the long-term aim of predicting how drugs interact with cells and supporting the design of new therapeutics. Mount Sinai Health System Journal reference: Clarke, D. J. B., et al. (2026). GSFM: A gene set foundation model pre-trained on a massive collection of diverse gene sets. Patterns. DOI: 10.1016/j.patter.2026.101565. https://www.cell.com/patterns/fulltext/S2666-3899(26)00074-7
[2]
Researchers Develop AI Model That Maps How Genes Work Together in Human Cells | Newswise
Newswise -- New York, NY -- [May 21, 2026] -- Scientists at the Icahn School of Medicine at Mount Sinai have created a new artificial intelligence (AI) model that helps reveal how genes function together inside human cells, offering a powerful new way to understand biology and disease. The study, published in the May 21 online issue of Patterns, a Cell Press Journal [https://doi.org/10.1016/j.patter.2026.101565], introduces a gene set foundation model (GSFM) designed to learn patterns in how genes are grouped and function across thousands of biological contexts. The work draws inspiration from advances in large language models (LLMs) such as ChatGPT, which learn how words gain meaning depending on their context. In a similar way, a GSFM learns how genes behave differently depending on their cellular "context." "Genes rarely act alone. Instead, they participate in multiple biological processes, forming different molecular groupings depending on where and when they are active in the cell. A single gene can play different roles in different settings, much like a word can have different meanings in different sentences," says senior corresponding author Avi Ma'ayan, PhD, Professor of Pharmacological Sciences and Director of the Mount Sinai Center for Bioinformatics at the Icahn School of Medicine at Mount Sinai. "Just as modern language models learn the meaning of words from context, we asked whether AI could learn the 'meaning' of genes in the same way. Our GSFM was designed to do exactly that." The model provides a new way to understand the structural and functional organization of genes and their products inside human cells. This improved understanding could eventually support the development of better diagnostics, biomarkers, and therapies. By mapping how genes relate to one another across many biological situations, the GSFM creates a reference framework that can help scientists interpret complex multi-omics datasets more effectively, say the investigators. "The organization of genes within cells remains one of the major unsolved questions in biology. The GSFM helps address this by learning from millions of gene groupings derived from published research and gene expression datasets," says Dr. Ma'ayan. The model can: In essence, say the investigators, GSFM offers a new "map" of how genes work together in different contexts. To build the model, the researchers compiled millions of gene sets from published scientific studies and gene expression datasets. In total, the system learned from hundreds of thousands of independent research efforts. The AI model was trained in a way similar to solving a puzzle: it was given part of a gene set and asked to predict the missing pieces. Over time, it learned underlying patterns that describe how genes are grouped and interact. The AI model was then benchmarked against other approaches and demonstrated strong performance, including the ability to identify gene-gene and gene-function relationships before they were confirmed experimentally. To evaluate this, the model was trained using gene sets from publications up to a defined cutoff date, and then tested on whether it could predict discoveries reported in studies published after that cutoff date. "Unlike previous biological AI models that primarily rely on gene expression data, our GSFM is uniquely trained on gene sets, a different and largely underused type of biological information," says Dr. Ma'ayan. "This approach allows the model to integrate diverse data from many diseases, experimental methods, and research conditions, creating a unified representation of gene relationships across biology." GSFMs could enhance existing bioinformatics tools and improve the interpretation of data collected with omics technologies. One immediate application is in gene set enrichment analysis, a widely used method in molecular biology research. By improving how scientists interpret gene groupings, the model may help uncover new biological insights from both existing and future datasets. The research team plans to expand the system by combining GSFM with other AI foundation models. One goal is to integrate it with language-based models to generate natural-language explanations of gene functions. Another future direction is combining GSFM with drug-focused AI models, with the long-term aim of predicting how drugs interact with cells and supporting the design of new therapeutics. The gene pages and the GSFM model are accessible at https://gsfm.maayanlab.cloud and https://github.com/MaayanLab/gsfm. The paper is titled "GSFM: A Gene Set Foundation Model Pre-Trained on a Massive Collection of Diverse Gene Sets." The study's authors, as listed in the journal, are Daniel J. B. Clarke, Giacomo B. Marino, and Avi Ma'ayan. This work was partially funded by NIH grants OT2OD036435, OT2OD030160, U24CA264250, U24CA271114, R01DK131525, RC2DK131995. About the Icahn School of Medicine at Mount Sinai The Icahn School of Medicine at Mount Sinai is internationally renowned for its outstanding research, educational, and clinical care programs. It is the sole academic partner for the seven member hospitals* of the Mount Sinai Health System, one of the largest academic health systems in the United States, providing care to New York City's large and diverse patient population. The Icahn School of Medicine at Mount Sinai offers highly competitive MD, PhD, MD-PhD, and master's degree programs, with enrollment of more than 1,200 students. It has the largest graduate medical education program in the country, with more than 2,700 clinical residents and fellows training throughout the Health System. The Graduate School of Biomedical Sciences offers 13 degree-granting programs, conducts innovative basic and translational research, and trains more than 4705 postdoctoral research fellows. Ranked 11th nationwide in National Institutes of Health (NIH) funding, the Icahn School of Medicine at Mount Sinai is among the 90th percentile of U.S. private medical schools in Sponsored Programs Direct Expenditures per Principal Investigator, according to the Association of American Medical Colleges. More than 6,900 scientists, educators, and clinicians work within and across dozens of academic departments and multidisciplinary institutes with an emphasis on translational research and therapeutics. Through Mount Sinai Innovation Partners (MSIP), the Health System facilitates the real-world application and commercialization of medical breakthroughs made at Mount Sinai. -------------------------------------------------------
Share
Copy Link
Researchers at Mount Sinai's Icahn School of Medicine developed a Gene Set Foundation Model that learns gene function patterns across thousands of biological contexts. Inspired by ChatGPT, the AI model analyzes millions of gene groupings to reveal how genes collaborate in cells, potentially accelerating drug discovery and disease understanding.
Scientists at the Icahn School of Medicine at Mount Sinai have developed an AI model that reveals how genes work together inside human cells, marking a significant advance in understanding biology and disease. Published in Patterns, a Cell Press Journal on May 21, the study introduces a Gene Set Foundation Model (GSFM) designed to learn gene grouping patterns across thousands of biological contexts
1
2
.The work draws inspiration from large language models such as ChatGPT, which learn how words gain meaning depending on context. Similarly, the GSFM learns how genes behave differently depending on their cellular "context," according to senior author Avi Ma'ayan, PhD, Professor of Pharmacological Sciences and Director of the Mount Sinai Center for Bioinformatics
1
.
Source: News-Medical
"Genes rarely act alone. Instead, they participate in multiple biological processes, forming different molecular groupings depending on where and when they are active in the cell," Dr. Ma'ayan explains. "A single gene can play different roles in different settings, much like a word can have different meanings in different sentences"
2
. This new understanding of gene organization could eventually support the development of better diagnostics, biomarkers, and therapies.The model addresses one of biology's major unsolved questions: how genes organize within cells. By learning from millions of gene groupings derived from published research and gene expression datasets, the GSFM creates a reference framework that helps scientists interpret complex multi-omics datasets more effectively
1
.To build the AI model, researchers compiled millions of gene sets from published scientific studies and gene expression datasets, learning from hundreds of thousands of independent research efforts. The system was trained similarly to solving a puzzle: given part of a gene set, it predicted the missing pieces. Over time, it learned underlying patterns that describe how genes are grouped and interact.
"Unlike previous biological AI models that primarily rely on gene expression data, our GSFM is uniquely trained on gene sets, a different and largely underused type of biological information," says Dr. Ma'ayan. "This approach allows the model to integrate diverse data from many diseases, experimental methods, and research conditions, creating a unified representation of gene relationships across biology"
1
.Related Stories
The AI model demonstrated strong performance when benchmarked against other approaches, including the ability to identify gene-gene and gene-function relationships before they were confirmed experimentally. To evaluate this predictive capability, researchers trained the model using gene sets from publications up to a defined cutoff date, then tested whether it could predict discoveries reported in studies published after that date
1
.The model can identify functions of poorly understood genes without immediate laboratory experiments, highlight genes involved in disease processes, suggest potential drug targets, and provide a reusable knowledge system for biomedical research data analysis tasks. One immediate application is improving gene set enrichment analysis, a widely used method in molecular biology research
1
.The research team plans to expand the system by combining GSFM with other AI foundation models. One goal is integrating it with language-based models to generate natural-language explanations of gene function. Another future direction involves combining GSFM with drug-focused AI models, with the long-term aim of predicting drug interactions with cells and supporting the design of new therapeutics
2
.This work was partially funded by NIH grants and the GSFM model is accessible at https://gsfm.maayanlab.cloud. The study's authors include Daniel J. B. Clarke, Giacomo B. Marino, and Avi Ma'ayan
2
.Summarized by
Navi
09 Jan 2025•Science and Research

29 Aug 2025•Science and Research

24 Oct 2024•Science and Research

1
Science and Research

2
Technology

3
Policy and Regulation
