Scientists create AI model that maps how genes work together inside human cells

2 Sources

Share

Researchers at Mount Sinai's Icahn School of Medicine developed a Gene Set Foundation Model that learns gene function patterns across thousands of biological contexts. Inspired by ChatGPT, the AI model analyzes millions of gene groupings to reveal how genes collaborate in cells, potentially accelerating drug discovery and disease understanding.

Mount Sinai Researchers Develop Gene Set Foundation Model

Scientists at the Icahn School of Medicine at Mount Sinai have developed an AI model that reveals how genes work together inside human cells, marking a significant advance in understanding biology and disease. Published in Patterns, a Cell Press Journal on May 21, the study introduces a Gene Set Foundation Model (GSFM) designed to learn gene grouping patterns across thousands of biological contexts

1

2

.

The work draws inspiration from large language models such as ChatGPT, which learn how words gain meaning depending on context. Similarly, the GSFM learns how genes behave differently depending on their cellular "context," according to senior author Avi Ma'ayan, PhD, Professor of Pharmacological Sciences and Director of the Mount Sinai Center for Bioinformatics

1

.

Source: News-Medical

Source: News-Medical

Understanding Gene Function Through Contextual Learning

"Genes rarely act alone. Instead, they participate in multiple biological processes, forming different molecular groupings depending on where and when they are active in the cell," Dr. Ma'ayan explains. "A single gene can play different roles in different settings, much like a word can have different meanings in different sentences"

2

. This new understanding of gene organization could eventually support the development of better diagnostics, biomarkers, and therapies.

The model addresses one of biology's major unsolved questions: how genes organize within cells. By learning from millions of gene groupings derived from published research and gene expression datasets, the GSFM creates a reference framework that helps scientists interpret complex multi-omics datasets more effectively

1

.

Training on Millions of Gene Sets Across Biological Contexts

To build the AI model, researchers compiled millions of gene sets from published scientific studies and gene expression datasets, learning from hundreds of thousands of independent research efforts. The system was trained similarly to solving a puzzle: given part of a gene set, it predicted the missing pieces. Over time, it learned underlying patterns that describe how genes are grouped and interact.

"Unlike previous biological AI models that primarily rely on gene expression data, our GSFM is uniquely trained on gene sets, a different and largely underused type of biological information," says Dr. Ma'ayan. "This approach allows the model to integrate diverse data from many diseases, experimental methods, and research conditions, creating a unified representation of gene relationships across biology"

1

.

Identifying Gene-Gene and Gene-Function Relationships

The AI model demonstrated strong performance when benchmarked against other approaches, including the ability to identify gene-gene and gene-function relationships before they were confirmed experimentally. To evaluate this predictive capability, researchers trained the model using gene sets from publications up to a defined cutoff date, then tested whether it could predict discoveries reported in studies published after that date

1

.

The model can identify functions of poorly understood genes without immediate laboratory experiments, highlight genes involved in disease processes, suggest potential drug targets, and provide a reusable knowledge system for biomedical research data analysis tasks. One immediate application is improving gene set enrichment analysis, a widely used method in molecular biology research

1

.

Future Integration with Drug-Focused AI Models

The research team plans to expand the system by combining GSFM with other AI foundation models. One goal is integrating it with language-based models to generate natural-language explanations of gene function. Another future direction involves combining GSFM with drug-focused AI models, with the long-term aim of predicting drug interactions with cells and supporting the design of new therapeutics

2

.

This work was partially funded by NIH grants and the GSFM model is accessible at https://gsfm.maayanlab.cloud. The study's authors include Daniel J. B. Clarke, Giacomo B. Marino, and Avi Ma'ayan

2

.

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved