Curated by THEOUTPOST
On Sat, 1 Feb, 8:03 AM UTC
2 Sources
[1]
With generative AI, chemists quickly calculate 3D genomic structures
Every cell in your body contains the same genetic sequence, yet each cell expresses only a subset of those genes. These cell-specific gene expression patterns, which ensure that a brain cell is different from a skin cell, are partly determined by the three-dimensional structure of the genetic material, which controls the accessibility of each gene. MIT chemists have now come up with a new way to determine those 3D genome structures, using generative artificial intelligence. Their technique can predict thousands of structures in just minutes, making it much speedier than existing experimental methods for analyzing the structures. Using this technique, researchers could more easily study how the 3D organization of the genome affects individual cells' gene expression patterns and functions. "Our goal was to try to predict the three-dimensional genome structure from the underlying DNA sequence," says Bin Zhang, an associate professor of chemistry and the senior author of the study. "Now that we can do that, which puts this technique on par with the cutting-edge experimental techniques, it can really open up a lot of interesting opportunities." MIT graduate students Greg Schuette and Zhuohan Lao are the lead authors of the paper, which appears today in Science Advances. From sequence to structure Inside the cell nucleus, DNA and proteins form a complex called chromatin, which has several levels of organization, allowing cells to cram 2 meters of DNA into a nucleus that is only one-hundredth of a millimeter in diameter. Long strands of DNA wind around proteins called histones, giving rise to a structure somewhat like beads on a string. Chemical tags known as epigenetic modifications can be attached to DNA at specific locations, and these tags, which vary by cell type, affect the folding of the chromatin and the accessibility of nearby genes. These differences in chromatin conformation help determine which genes are expressed in different cell types, or at different times within a given cell. Over the past 20 years, scientists have developed experimental techniques for determining chromatin structures. One widely used technique, known as Hi-C, works by linking together neighboring DNA strands in the cell's nucleus. Researchers can then determine which segments are located near each other by shredding the DNA into many tiny pieces and sequencing it. This method can be used on large populations of cells to calculate an average structure for a section of chromatin, or on single cells to determine structures within that specific cell. However, Hi-C and similar techniques are labor-intensive, and it can take about a week to generate data from one cell. To overcome those limitations, Zhang and his students developed a model that takes advantage of recent advances in generative AI to create a fast, accurate way to predict chromatin structures in single cells. The AI model that they designed can quickly analyze DNA sequences and predict the chromatin structures that those sequences might produce in a cell. "Deep learning is really good at pattern recognition," Zhang says. "It allows us to analyze very long DNA segments, thousands of base pairs, and figure out what is the important information encoded in those DNA base pairs." ChromoGen, the model that the researchers created, has two components. The first component, a deep learning model taught to "read" the genome, analyzes the information encoded in the underlying DNA sequence and chromatin accessibility data, the latter of which is widely available and cell type-specific. The second component is a generative AI model that predicts physically accurate chromatin conformations, having been trained on more than 11 million chromatin conformations. These data were generated from experiments using Dip-C (a variant of Hi-C) on 16 cells from a line of human B lymphocytes. When integrated, the first component informs the generative model how the cell type-specific environment influences the formation of different chromatin structures, and this scheme effectively captures sequence-structure relationships. For each sequence, the researchers use their model to generate many possible structures. That's because DNA is a very disordered molecule, so a single DNA sequence can give rise to many different possible conformations. "A major complicating factor of predicting the structure of the genome is that there isn't a single solution that we're aiming for. There's a distribution of structures, no matter what portion of the genome you're looking at. Predicting that very complicated, high-dimensional statistical distribution is something that is incredibly challenging to do," Schuette says. Rapid analysis Once trained, the model can generate predictions on a much faster timescale than Hi-C or other experimental techniques. "Whereas you might spend six months running experiments to get a few dozen structures in a given cell type, you can generate a thousand structures in a particular region with our model in 20 minutes on just one GPU," Schuette says. After training their model, the researchers used it to generate structure predictions for more than 2,000 DNA sequences, then compared them to the experimentally determined structures for those sequences. They found that the structures generated by the model were the same or very similar to those seen in the experimental data. "We typically look at hundreds or thousands of conformations for each sequence, and that gives you a reasonable representation of the diversity of the structures that a particular region can have," Zhang says. "If you repeat your experiment multiple times, in different cells, you will very likely end up with a very different conformation. That's what our model is trying to predict." The researchers also found that the model could make accurate predictions for data from cell types other than the one it was trained on. This suggests that the model could be useful for analyzing how chromatin structures differ between cell types, and how those differences affect their function. The model could also be used to explore different chromatin states that can exist within a single cell, and how those changes affect gene expression. Another possible application would be to explore how mutations in a particular DNA sequence change the chromatin conformation, which could shed light on how such mutations may cause disease. "There are a lot of interesting questions that I think we can address with this type of model," Zhang says. The researchers have made all of their data and the model available to others who wish to use it. The research was funded by the National Institutes of Health.
[2]
With generative AI, MIT chemists quickly calculate 3D genomic structures
Every cell in your body contains the same genetic sequence, yet each cell expresses only a subset of those genes. These cell-specific gene expression patterns, which ensure that a brain cell is different from a skin cell, are partly determined by the three-dimensional structure of the genetic material, which controls the accessibility of each gene. MIT chemists have now come up with a new way to determine those 3D genome structures, using generative artificial intelligence. Their technique can predict thousands of structures in just minutes, making it much speedier than existing experimental methods for analyzing the structures. Using this technique, researchers could more easily study how the 3D organization of the genome affects individual cells' gene expression patterns and functions. "Our goal was to try to predict the three-dimensional genome structure from the underlying DNA sequence," says Bin Zhang, an associate professor of chemistry and the senior author of the study. "Now that we can do that, which puts this technique on par with the cutting-edge experimental techniques, it can really open up a lot of interesting opportunities." MIT graduate students Greg Schuette and Zhuohan Lao are the lead authors of the paper, which appears today in Science Advances. From sequence to structure Inside the cell nucleus, DNA and proteins form a complex called chromatin, which has several levels of organization, allowing cells to cram 2 meters of DNA into a nucleus that is only one-hundredth of a millimeter in diameter. Long strands of DNA wind around proteins called histones, giving rise to a structure somewhat like beads on a string. Chemical tags known as epigenetic modifications can be attached to DNA at specific locations, and these tags, which vary by cell type, affect the folding of the chromatin and the accessibility of nearby genes. These differences in chromatin conformation help determine which genes are expressed in different cell types, or at different times within a given cell. Over the past 20 years, scientists have developed experimental techniques for determining chromatin structures. One widely used technique, known as Hi-C, works by linking together neighboring DNA strands in the cell's nucleus. Researchers can then determine which segments are located near each other by shredding the DNA into many tiny pieces and sequencing it. This method can be used on large populations of cells to calculate an average structure for a section of chromatin, or on single cells to determine structures within that specific cell. However, Hi-C and similar techniques are labor-intensive, and it can take about a week to generate data from one cell. To overcome those limitations, Zhang and his students developed a model that takes advantage of recent advances in generative AI to create a fast, accurate way to predict chromatin structures in single cells. The AI model that they designed can quickly analyze DNA sequences and predict the chromatin structures that those sequences might produce in a cell. "Deep learning is really good at pattern recognition," Zhang says. "It allows us to analyze very long DNA segments, thousands of base pairs, and figure out what is the important information encoded in those DNA base pairs." ChromoGen, the model that the researchers created, has two components. The first component, a deep learning model taught to "read" the genome, analyzes the information encoded in the underlying DNA sequence and chromatin accessibility data, the latter of which is widely available and cell type-specific. The second component is a generative AI model that predicts physically accurate chromatin conformations, having been trained on more than 11 million chromatin conformations. These data were generated from experiments using Dip-C (a variant of Hi-C) on 16 cells from a line of human B lymphocytes. When integrated, the first component informs the generative model how the cell type-specific environment influences the formation of different chromatin structures, and this scheme effectively captures sequence-structure relationships. For each sequence, the researchers use their model to generate many possible structures. That's because DNA is a very disordered molecule, so a single DNA sequence can give rise to many different possible conformations. "A major complicating factor of predicting the structure of the genome is that there isn't a single solution that we're aiming for. There's a distribution of structures, no matter what portion of the genome you're looking at. Predicting that very complicated, high-dimensional statistical distribution is something that is incredibly challenging to do," Schuette says. Rapid analysis Once trained, the model can generate predictions on a much faster timescale than Hi-C or other experimental techniques. "Whereas you might spend six months running experiments to get a few dozen structures in a given cell type, you can generate a thousand structures in a particular region with our model in 20 minutes on just one GPU," Schuette says. After training their model, the researchers used it to generate structure predictions for more than 2,000 DNA sequences, then compared them to the experimentally determined structures for those sequences. They found that the structures generated by the model were the same or very similar to those seen in the experimental data. "We typically look at hundreds or thousands of conformations for each sequence, and that gives you a reasonable representation of the diversity of the structures that a particular region can have," Zhang says. "If you repeat your experiment multiple times, in different cells, you will very likely end up with a very different conformation. That's what our model is trying to predict." The researchers also found that the model could make accurate predictions for data from cell types other than the one it was trained on. This suggests that the model could be useful for analyzing how chromatin structures differ between cell types, and how those differences affect their function. The model could also be used to explore different chromatin states that can exist within a single cell, and how those changes affect gene expression. Another possible application would be to explore how mutations in a particular DNA sequence change the chromatin conformation, which could shed light on how such mutations may cause disease. "There are a lot of interesting questions that I think we can address with this type of model," Zhang says. The researchers have made all of their data and the model available to others who wish to use it. The research was funded by the National Institutes of Health.
Share
Share
Copy Link
MIT researchers have developed a groundbreaking AI model that can rapidly predict 3D genomic structures, potentially transforming our understanding of gene expression and cellular function.
In a groundbreaking advancement at the intersection of artificial intelligence and genomics, MIT chemists have unveiled a novel approach to determining 3D genome structures using generative AI. This innovative technique, dubbed ChromoGen, promises to revolutionize our understanding of gene expression and cellular function by rapidly predicting thousands of genomic structures in minutes 12.
Every cell in the human body contains identical genetic material, yet cells express genes differently based on their type and function. The three-dimensional structure of genetic material plays a crucial role in determining which genes are accessible and expressed in specific cell types. This 3D organization is key to understanding why a brain cell differs from a skin cell, despite sharing the same genetic sequence 1.
The researchers, led by Associate Professor Bin Zhang, developed ChromoGen, an AI model with two primary components:
This integrated approach allows ChromoGen to capture complex sequence-structure relationships and generate multiple possible conformations for each DNA sequence, reflecting the inherently disordered nature of DNA molecules 2.
ChromoGen's most significant advantage is its speed and efficiency. While traditional experimental methods like Hi-C can take about a week to generate data from a single cell, ChromoGen can produce a thousand structure predictions for a particular genomic region in just 20 minutes using a single GPU 12.
Greg Schuette, one of the lead authors, emphasizes the model's efficiency:
"Whereas you might spend six months running experiments to get a few dozen structures in a given cell type, you can generate a thousand structures in a particular region with our model in 20 minutes on just one GPU" 2.
To validate their model, the researchers generated structure predictions for over 2,000 DNA sequences and compared them to experimentally determined structures. The results showed that ChromoGen's predictions closely matched or were very similar to the experimental data, demonstrating its accuracy and reliability 12.
The development of ChromoGen opens up new possibilities for studying how the 3D organization of the genome affects gene expression patterns and cellular functions. By providing a fast and accurate method for predicting genomic structures, this tool could accelerate research in various fields, including:
As Bin Zhang notes, "Now that we can do that, which puts this technique on par with the cutting-edge experimental techniques, it can really open up a lot of interesting opportunities" 1.
While ChromoGen represents a significant leap forward in genomic structure prediction, the researchers acknowledge that there is still more to explore. The model's ability to generate multiple conformations for each sequence reflects the dynamic nature of DNA structures within cells, providing a more comprehensive view of genomic organization 2.
As this technology continues to develop, it may lead to new insights into cellular processes, disease mechanisms, and potential therapeutic targets, furthering our understanding of the complex relationship between genetic sequence, structure, and function.
Reference
[2]
Massachusetts Institute of Technology
|With generative AI, MIT chemists quickly calculate 3D genomic structuresScientists at Columbia University have developed an AI model called GET that can accurately predict gene activity in human cells, potentially revolutionizing our understanding of cellular biology and disease mechanisms.
5 Sources
5 Sources
Researchers use AI to create synthetic DNA switches (CREs) that can precisely control gene expression in specific cell types, potentially revolutionizing gene therapy and targeted treatments.
6 Sources
6 Sources
Scientists unveil Evo-2, a groundbreaking AI model trained on 128,000 genomes, capable of generating entire chromosomes and small genomes. This advancement promises to transform genetic research and genome engineering.
5 Sources
5 Sources
MIT researchers have developed an AI model that can accurately predict the structure of crystalline materials, potentially accelerating materials discovery and design. This breakthrough could have significant implications for various industries, from electronics to energy storage.
2 Sources
2 Sources
Researchers at Linköping University have enhanced AlphaFold, enabling it to predict very large and complex protein structures while incorporating experimental data. This advancement, called AF_unmasked, marks a significant step towards more efficient protein design for medical and scientific applications.
2 Sources
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved