7 Sources
7 Sources
[1]
Google DeepMind unleashes new AI AlphaGenome to investigate DNA's 'dark matter'
DeepMind's AlphaGenome AI model could help solve the problem of predicting how variations in noncoding DNA shape gene expression DNA is the blueprint for life, influencing everything about us -- including our health. We know that our genes, the genetic "words" that encode proteins, play a major role in our wellbeing. But the vast majority of our genome -- more than 98 percent, in fact -- consists of DNA that doesn't build proteins. Once disregarded as "junk DNA," scientists now know that this molecular dark matter is crucial for determining gene activity in ways that keep us healthy -- or cause disease. Exactly how this mysterious DNA shapes gene expression is a mystery -- but now the AI lab Google DeepMind has built a model that DeepMind says can predict the function of long stretches of noncoding DNA. The information it turns up could help solve the problem of predicting how these chunks of DNA influence our health. Called AlphaGenome, the model takes in sequences of up to one million DNA letters, also known as base pairs, and predicts how mutations in those stretches affect gene expression. The model is described today in Nature. The tool, a version of which DeepMind has made freely available to other researchers, could help scientists narrow down theories for how certain DNA changes affect gene function. In turn, this knowledge could help scientists craft better treatments for genetic diseases. On supporting science journalism If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today. "Ever since the human genome was sequenced, people have been trying to understand the semantics of it -- this has been a longstanding goal for DeepMind," says Pushmeet Kohli, the company's vice president for science and a coauthor of the new study. "It's like you have a huge book of three billion characters and something wrong happened in this book." "AlphaGenome can be used to say, 'If you change these words, what would be the effect?'" he adds. AlphaGenome works by combining information from several datasets focused on different aspects of gene expression -- how genes are turned on or off. The model is a successor of sorts to DeepMind's AlphaFold, an AI model that predicts the structure of almost every known protein from its amino acid sequence -- a central problem in biology. The researchers behind that effort shared the Nobel Prize in Chemistry in 2024. And in 2023 DeepMind released AlphaMissense, another AI tool that predicts how mutations in the regions of the genome that do generate proteins affect gene function. According to DeepMind's researchers, AlphaGenome performs as well or better than most other methods they tested. Previous tools generally required a trade-off between the length of a DNA sequence that could be used as input and accuracy. A key advance of AlphaGenome's approach is the ability to make accurate predictions about the function of extremely long genome sequences. "The genome is like the recipe of life," Kohli said in a press briefing about the work. "And really understanding 'What is the effect of changing any part of the recipe?' is what AlphaGenome sort of looks at." AlphaGenome is a research tool -- it's not meant to be used clinically and its results can't be easily applied to individual humans. But it could have applications in understanding how the genome regulates genes in different types of cells or tissues. It could also help us understand diseases through massive genome-wide association studies or for studying cancer, since tumors can have many different genetic mutations and its not always clear which ones cause disease. It could even be useful for diagnosing rare conditions and designing new gene therapies. "For all the best evaluations we have, AlphaGenome looks like they pushed [the field] forward a little bit," says David Kelley, a principal investigator at Calico Life Sciences, a company owned by Google's parent company Alphabet. Kelley was not involved with the study but has collaborated with the authors on a previous AI model. "I think the long sequence length that they're able to work with here is definitely one of those major engineering breakthroughs," he says, adding that the new AI is "incremental but real progress." Predicting how a disease manifests from the genome "is an extremely hard problem, and this model is not able to magically predict that," says Žiga Avsec, a research scientist leading DeepMind's genomics initiative. But AlphaGenome can narrow down the pool of possible mutations involved in a disease, making it useful for prioritizing research to pinpoint which gene variants are actually causing problems, he says. DeepMind is not blind to the fact that the model is imperfect. The company's researchers are working to both boost its predictive power and to better report how uncertain those predictions are.
[2]
AI tool AlphaGenome predicts how one typo can change a genetic story
The model can predict changes in 11 biological activities across 1 million DNA letters A new deep-learning AI model may help scientists better decipher the plot of the genetic instruction book and learn how typos alter the story. AlphaGenome, created by Google DeepMind, is the latest in an ever-improving line of AI models built to analyze vast stretches of DNA. The previous front-runner, a model called Borzoi, could predict molecular signposts in stretches of DNA 500,000 bases long. AlphaGenome can analyze 1 million DNA building blocks at a time, researchers report January 28 in Nature. The model may have practical implications for diagnosing rare genetic diseases, identifying cancer-driving mutations, designing synthetic DNA sequences or therapeutic RNAs and better understanding basic biology. "AlphaGenome is not just a bigger model in terms of context length, but it actually is quite a leap forward in its overall utility," says Anshul Kundaje, a computational biologist at Stanford University who develops AI models for genomics. For instance, a genetic change may have no effect on nearby genes but could change activity of genes far away. Because AlphaGenome examines longer stretches of DNA, it is more likely to spot such long-distance relationships. But AlphaGenome isn't perfect. Unpublished data from Kundaje's lab indicates the model struggles with predicting how gene activity changes in individuals. Right now, the model is a tool for uncovering basic biology not something doctors could use to diagnose or treat patients. AlphaGenome has "maxed out" what this type of model can do, Kundaje says. He predicts the next big leap will come from scientists generating new types of data for the model or its descendants to analyze. AlphaGenome can pinpoint biologically important spots down to single base resolution, says Peter Koo, a computational biologist at Cold Spring Harbor Laboratory in New York. That's much higher resolution than Borzoi, which flagged points of biological interest in 32 base-pair bins. That's a big task considering that the model's reference is the 3-billion-base-long human genome, often called a genetic instruction book. The book is actually a multivolume, choose-your-own-adventure, popup encyclopedia. Genes, the short stories of the book, are told in small phrases that can be rearranged, shortened or skipped. In between the story fragments are passages that may contain instructions for how to read a different story entirely. Pages and chapters are intricately folded into each other so that pulling a tab in one passage causes something to pop up chapters away. Much of the book is filled with what many people thought was nonsense but is often essential reading material. Researchers have cataloged a dizzying array of punctuation marks, origami-like creases, syntax swaps, margin scribbles and other types of biological grammar that cells use to make sense of the book. AlphaGenome's task is to take a string of DNA letters and predict how plot points, punctuation and other variations affect 11 distinct biological processes, including RNA splicing, gene activity levels and certain protein-DNA interactions. The model considers 5,930 data points from studies of human DNA and 1,128 in mouse DNA. With those data, the AI can predict how changing a single letter, or base, in the million-base string alters the story. Specialized computational models that predict subsets of these biological functions have been in use for years, but AlphaGenome outperforms them on most measures and does particularly well at identifying some features in different types of cells, the researchers report. For example, AlphaGenome identified gene activity changes in certain cell types 14.7 percent better than Borzoi2. "By doing well on so many different genomic tasks simultaneously, we believe this demonstrates that the model has learned a powerful general representation of DNA sequences and the complex processes these sequences encode," said Natasha Latysheva of Google DeepMind January 27 during a news briefing. The tool could make things easier for researchers who are trying to understand how the genome works, says Judit GarcÃa González, a human geneticist at the Ichan School of Medicine at Mount Sinai in New York City. Before AlphaGenome, a researcher "might need to use three different tools with their own caveats, and [have] to learn how they work, for predicting say 20 different genomic functional consequences," she says. Now, AlphaGenome unites all those in one tool. AlphaGenome isn't an entirely new invention. It builds on previous models but uses aspects of those models in clever ways. "There is no single innovation in AlphaGenome that one can pinpoint as a critical innovation. It's really a system of lots of tricks and engineering," Koo says. AlphaGenome used one trick called ensemble distillation that Koo's lab has been experimenting with. That strategy pretrains multiple copies of the model each on computationally mutated DNA. Those models serve as teachers to a single student model that averages their outputs. It's like having 60 history professors give their account of an important event, Koo says. "If you consider the consensus across what every historian agrees, what overlaps across their story lines, that is probably what might actually be true." The consensus, he says, "tends to be more reliable than trusting any individual model."
[3]
Google's AlphaGenome wants to do for DNA what AlphaFold did for ...
Google's new deep learning model can predict the effect of small changes to DNA sequences up to one million base pairs in length and is particularly good with non-coding DNA, which has proven especially difficult to understand. The artificial intelligence (AI) tool - called AlphaGenome - offers researchers a way to better understand the human genome and may help scientists develop treatments for disease. AlphaGenome is 'a foundational, high-quality tool that turns the static code of the genome into a decipherable language' Robert Goldstone, Francis Crick Institute Small variations in the human genome can have a big impact on a person's health, causing genetic disorders like cystic fibrosis or certain cancers. Most changes occur in the genome's non-coding regions that make up 98% of the total DNA. These regions influence the expression of genes, rather than coding for proteins, and alterations can often have a range of biological effects, making it hard to predict their impact. AlphaGenome, developed by Google DeepMind, can predict the molecular impact of single base pair variations across whole DNA sequences up to a million base pairs in length. This builds on Google's earlier model, AlphaMissense, which was only able to understand the effects of variations in the coding region of DNA sequences. The new model - trained on human and mouse genome data - takes a DNA sequence as an input and gives predictions on various genetic signals that relate to specific biological functions. This includes gene expression, DNA's accessibility to proteins and where gene splicing occurs. 'The key [benefit] is that you can introduce a mutation to the sequence, changing for example a C [base pair] to a T, and then use the model to compare these differences,' says Google DeepMind researcher Žiga Avsec. What do we mean when we say AI? AlphaGenome matched or outperformed other state-of-the-art models in 25 out of 26 tasks predicting the effects of genetic variations. The team were also able to simulate known DNA mutations responsible for a type of leukaemia, predicting the same results as those observed in the lab. 'Previously, the field required separate models for separate tasks,' says Avsec, adding that earlier models also often had a trade-off between sequence length and resolution. 'AlphaGenome unifies these under one roof.' Natasha Latysheva, a senior research engineer at DeepMind, explains that AlphaGenome may help improve fundamental knowledge about the genome, improve understanding of rare diseases and cancers or help scientists design new DNA sequences to treat specific conditions. AlphaGenome adds to the collection of other AI tools developed by Google DeepMind, which includes the 2024 Nobel prize winning AlphaFold that predicts the 3D shape of proteins. Pushmeet Kohli, who led the work, explains that 'the genome is the recipe and understanding the effect of changing any part of the recipe is what AlphaGenome looks at'. AlphaGenome turns genetic code into 'decipherable language of discovery' Robert Goldstone, head of genomics at the Francis Crick Institute in the UK, believes that AlphaGenome is 'a foundational, high-quality tool that turns the static code of the genome into a decipherable language for discovery', but warns that it 'is not a magic bullet for all biological questions'. Despite the improvements, AlphaGenome still has a number of limitations. Like other models, it struggles to predict the influence of genetic alterations that are more than 100,000 base pairs apart and can only make predictions about DNA sequences from the cell types used to train the model - namely human and mouse. Another issue is interpreting results from the model, explains Jian Zhou, a genomics machine learning researcher at the University of Chicago in the US. 'Even when the model makes accurate predictions, it does not always directly inform us of the underlying biological processes,' he adds. Google DeepMind released a preview of the model for non-commercial research in June last year. Since then, Kohli explains that nearly 3000 scientists in 160 different countries have used AlphaGenome, submitting around 1 million requests each day. He hopes that 'AlphaGenome will continue to be a valuable resource for the scientific community and help scientists better understand genome function and disease biology, and ultimately drive new biological discoveries and ... new treatments'.
[4]
AI model from Google DeepMind reads recipe for life in our DNA
An AI model developed by Google's DeepMind could transform our understanding of DNA - the complete recipe for building and running the human body - and its impact on disease and medicine discovery, according to researchers. Called AlphaGenome, the model could help scientists discover why subtle differences in our DNA put us at risk of conditions such as high blood pressure, dementia and obesity. It could also dramatically accelerate our understanding of genetic diseases and cancer. The developers of the model acknowledge it's not perfect, but experts have described it as "an incredible feat" and "a major milestone". "We see AlphaGenome as a tool for understanding what the functional elements in the genome do, which we hope will accelerate our fundamental understanding of the code of life," says Natasha Latysheva, research engineer at DeepMind. The human genome is made up of three billion letters of DNA code - represented by the letters A,C,G and T. Around 2% of it are genes which code for all the proteins the body needs to grow and function. The remaining 98%, which is less well understood, is labelled the 'dark genome'. It plays a crucial role in organising how genes are used in the body and is where many mutations linked to disease are found. AlphaGenome can analyse one million letters of code at a time, helping to unravel the 'dark genome'. It can predict where the genes are, but also what the 'dark genome' is influencing. For example, how it affects gene expression (whether a gene is highly active or being suppressed) and gene splicing (the tool the body uses to make different proteins from a single gene). Crucially, the model can predict the impact of changing even a single letter in genetic code. Latysheva said she was "really excited" by the AI model's potential to understand which mutations cause disease and help pinpoint the cause of rare genetic diseases. The AI model could be used to "add another piece of the puzzle for the discovery of drug targets and ultimately the development of new drugs", she added. Ultimately, it could also be used in synthetic biology and the design of new sequences of DNA which could be used in gene therapies. AlphaGenome has been described in the journal Nature, but was made available for non-commercial use last year and 3,000 scientists have since used the tool. Dr Gareth Hawkes, from the University of Exeter, is using it to explore how mutations could be altering our risk of obesity and diabetes. Studies that sequenced the entire genetic code of tens of thousands of people have identified variants linked to the conditions, but they are often in the dark genome. "They're directly impacting some important piece of biology that we don't really understand," Hawkes told the BBC. Using AlphaGenome allows researchers to rapidly predict what those variants are up to so they can be tested in the lab. Hawkes said: "Those predictions will help to inform which biological processes those genetic variants might be impacting, and potentially lead to drug developments. "I wouldn't say the dark side of the genome is solved by AlphaGenome, but it's a big leap. I'm really excited." Cancer is another field where the AI model could accelerate research. AlphaGenome has been used to predict which mutations are fuelling cancer and are also the potential targets of treatment, and which mutations are incidental. Dr Robert Goldstone, head of genomics at the Francis Crick Institute, said the model was a "major milestone in the field of genomic AI" and the breakthrough was "an incredible technical feat" for its "ability to predict gene expression from DNA sequence alone". Prof Ben Lehner, the head of generative and synthetic genomics at the Wellcome Sanger Institute, said they had tested AlphaGenome in more than half a million experiments and it was performing very well. But he said it was "far from perfect" and there was still a lot of work to do. "It's a really exciting time with three areas where the UK is world-leading - genomics, biomedical research and AI - combining to transform biology and medicine," Prof Lehner said. The team at DeepMind won the Nobel Prize for Chemistry in 2024 for their work on AlphaFold - an AI system that predicts the 3D structure of proteins in the body. "I think we are at the start of a new era of scientific progress, and AI is going to enable a number of different breakthroughs," says Pushmeet Kohli, vice president of science and strategic initiatives at Google DeepMind. AlphaGenome doesn't work like large language models (such as ChatGPT) that predict the next word in a sequence. Instead, it is a "sequence-to-function model" looking at how changes in the text affect the meaning at the end. It was trained on publicly available databases of human and mouse cell experiments. There is general agreement that the AI model needs refining. It is less accurate in some areas such as predicting how genes are regulated over long distances (more than 100,000 letters of code away). The team also want to improve the accuracy of the model in different tissues. A neuron in the brain, for example, has the same genetic code as a beating heart cell, but each has different properties based on the way the genetic instructions are being used in each cell type.
[5]
DeepMind's New AI Can Read a Million DNA Letters at Once -- and Actually Understand Them
AlphaGenome is reportedly the most comprehensive and accurate DNA sequence model developed to date. Artificial intelligence has gotten a bad reputation lately, and often for good reason. But a team of scientists at Google's DeepMind now claims to have found a revolutionary use case for AI: helping humanity unravel the "dark matter" of our genome more effectively than ever before. In a study published today in Nature, DeepMind researchers debuted their deep learning model, dubbed AlphaGenome. Compared to existing models, AlphaGenome can predict the function of much longer sequences of DNA while still maintaining a similar level of accuracy, the researchers claim. The team is hopeful its model can become a valuable tool to analyze how subtle variations in human DNA can affect our health and biology, particularly in the vast majority of the genome that works silently in the background. A guide to our genetic dark matter Our DNA contains the instructions for building and regulating every biological aspect of ourselves. But only a tiny portion of our genes, 2% or so, actually carry the code for the tens to hundreds of thousands of proteins that perform the functions a body needs to survive, such as insulin or collagen. The other 98% of our DNA is made of non-coding regions, more eloquently known as the dark matter of our genome. Scientists once assumed our genetic dark matter was comprised of worthless junk DNA, but we now know that it contains sequences vital to regulating our protein-making genes. While scientists have mapped out most of the human genome, we still know very little about how many of these genes work, especially those found in non-coding regions; we're also largely in the dark about how variations in these genes can affect their functioning. Long before AI became a cultural buzzword (and punching bag), scientists had been using deep learning models -- trained on lab data -- to more efficiently sift through the mountains of the human genome and to predict a gene or DNA sequence's function. But DeepMind researchers say AlphaGenome is the most comprehensive and accurate DNA sequence model to date. The DeepMind researchers trained the model on both human and mouse genomes. It can reportedly analyze up to 1 megabase (Mb) -- about 1 million DNA letters -- at a time, compared to older models capable of analyzing upwards of 500 kilobases (kb), though at some cost. From that sequence, the model is said to "predict thousands of functional genomic tracks." These tracks don't just include how a gene or DNA sequence is expressed but also other less visible functions. These include the interactions between coding and non-coding regions of DNA, or the structure of chromatins (the loose packages of genetic material typically found in a cell; chromosomes are the more neatly packaged version). In the paper, the researchers also detailed how AlphaGenome matched or outperformed other existing AI models in 25 out of 26 tests measuring how well it could predict the effects of a genetic variant. More than just accuracy, however, the model can also do more at once; it can simultaneously predict nearly 6,000 human genetic signals tied to specific functions, according to the researchers. The future of AI genomics At least some outside scientists have praised the capabilities of AlphaGenome, while noting that it can't solve every lingering mystery about our genetic code just yet. "At the Wellcome Sanger Institute we have tested AlphaGenome using over half a million new experiments and it does indeed perform very well," Ben Lehner, head of Generative and Synthetic Genomics at the University of Cambridge's Wellcome Sanger Institute, told the Science Media Center. "However, AlphaGenome is far from perfect and there is still a lot of work to do. AI models are only as good as the data used to train them. Most existing data in biology is not very suitable for AI -- the datasets are too small and not well standardized." All that said, the DeepMind researchers -- and others in the field -- believe AlphaGenome marks a true milestone in AI genomics, one that could help make the technology practical for broader use. They argue that AlphaGenome, or similar models, could now be used to better diagnose rare genetic diseases, identify mutations that drive cancer, or uncover new drug targets.
[6]
Google DeepMind launches AI tool to help identify genetic drivers of disease
AlphaGenome can analyse up to 1m letters of DNA code at once and could pave way for new treatments Researchers at Google DeepMind have unveiled their latest artificial intelligence tool and claimed it will help scientists identify the genetic drivers of disease and ultimately pave the way for new treatments. AlphaGenome predicts how mutations interfere with the way genes are controlled, changing when they are switched on, in which cells of the body, and whether their biological volume controls are set to high or low. Most common diseases that run in families, including heart disease and autoimmune disorders, as well as mental health problems, have been linked to mutations that affect gene regulation, as have many cancers, but identifying which genetic glitches are to blame is far from straightforward. "We see AlphaGenome as a tool for understanding what the functional elements in the genome do, which we hope will accelerate our fundamental understanding of the code of life," Natasha Latysheva, a DeepMind researcher, told a press briefing on the work. The human genome runs to 3bn pairs of letters - the Gs, Ts, Cs and As that comprise the DNA code. About 2% of the genome tells cells how to make proteins, the building blocks of life. The rest orchestrates gene activity, carrying the crucial instructions that dictate where, when and how much individual genes are switched on. The researchers trained AlphaGenome on public databases of human and mouse genetics, enabling it to learn connections between mutations in specific tissues and their impact on gene regulation. The AI can analyse up to 1m letters of DNA code at once and predict how mutations will affect different biological processes. The DeepMind team believes the tool will help scientists map out which strands of genetic code are most essential for the development of particular tissues, such as nerve and liver cells, and pinpoint the most important mutations for driving cancer and other diseases. It could also underpin new gene therapies by allowing researchers to design entirely new DNA sequences - for example, to switch on a certain gene in nerve cells but not in muscle cells. Carl de Boer, a researcher at the University of British Columbia in Canada, who was not involved in the work, said: "AlphaGenome can identify whether mutations affect genome regulation, which genes are impacted and how, and in what cell types. A drug could then be developed to counteract this effect. "Ultimately, our goal is to have models that are so good we don't have to do an experiment to confirm their predictions. While AlphaGenome represents a significant innovation, achieving this goal will require continued work from the scientific community." Some scientists have already begun using AlphaGenome. Marc Mansour, a clinical professor of paediatric haemato-oncology at UCL, said it marked a "step change" in his work to find genetic drivers for cancer. Gareth Hawkes, a statistical geneticist at the University of Exeter, said: "The non-coding genome is 98% of our 3bn base pair genome. We understand the 2% fairly well, but the fact that we've got AlphaGenome that can make predictions of what this other 2.94bn base pair region is doing is a big step forward for us."
[7]
Google unveils AI tool probing mysteries of human genome
Paris (France) (AFP) - Google unveiled an artificial intelligence tool Wednesday that its scientists said would help unravel the mysteries of the human genome -- and could one day lead to new treatments for diseases. The deep learning model AlphaGenome was hailed by outside researchers as a "breakthrough" that would let scientists study and even simulate the roots of difficult-to-treat genetic diseases. While the first complete map of the human genome in 2003 "gave us the book of life, reading it remained a challenge", Pushmeet Kohli, vice president of research at Google DeepMind, told journalists. "We have the text," he said, which is a sequence of three billion nucleotide pairs represented by the letters A, T, C and G that make up DNA. However "understanding the grammar of this genome -- what is encoded in our DNA and how it governs life -- is the next critical frontier for research," said Kohli, co-author of a new study in the journal Nature. Only around two percent of our DNA contains instructions for making proteins, which are the molecules that build and run the body. The other 98 percent was long dismissed as "junk DNA" as scientists struggled to understand what it was for. However this "non-coding DNA" is now believed to act like a conductor, directing how genetic information works in each of our cells. These sequences also contain many variants that have been associated with diseases. It is these sequences that AlphaGenome is aiming to understand. A million letters The project is just one part of Google's AI-powered scientific work, which also includes AlphaFold, the winner of 2024's chemistry Nobel. AlphaGenome's model was trained on data from public projects that measured non-coding DNA across hundreds of different cell and tissue types in humans and mice. The tool is able to analyse long DNA sequences then predict how each nucleotide pair will influence different biological processes within the cell. This includes whether genes start and stop and how much RNA -- molecules which transmit genetic instructions inside cells -- is produced. Other models already exist that have a similar aim. However they have to compromise, either by analysing far shorter DNA sequences or decreasing how detailed their predictions are, known as resolution. DeepMind scientist and lead study author Ziga Avsec said that long sequences -- up to a million DNA letters long -- were "required to understand the full regulatory environment of a single gene". And the high resolution of the model allows scientists to study the impact of genetic variants by comparing the differences between mutated and non-mutated sequences. "AlphaGenome can accelerate our understanding of the genome by helping to map where the functional elements are and what their roles are on a molecular level," study co-author Natasha Latysheva said. The model has already been tested by 3,000 scientists across 160 countries and is open for anyone to use for non-commercial reasons, Google said. "We hope researchers will extend it with more data," Kohli added. 'Breakthrough' Ben Lehner, a researcher at Cambridge University who was not involved in developing AlphaGenome but did test it, said the model "does indeed perform very well". "Identifying the precise differences in our genomes that make us more or less likely to develop thousands of diseases is a key step towards developing better therapeutics," he explained. However AlphaGenome "is far from perfect and there is still a lot of work to do", he added. "AI models are only as good as the data used to train them" and the existing data is not very suitable, he said. Robert Goldstone, head of genomics at the UK's Francis Crick Institute, cautioned that AlphaGenome was "not a magic bullet for all biological questions". This was partly because "gene expression is influenced by complex environmental factors that the model cannot see", he said. However the tool still represented a "breakthrough" that would allow scientists to "study and simulate the genetic roots of complex disease", Goldstone added.
Share
Share
Copy Link
Google DeepMind unveiled AlphaGenome, an AI model that analyzes up to 1 million DNA letters at once to predict how mutations affect gene expression. The tool tackles the human genome's non-coding regions—98% of our DNA once dismissed as junk—and could accelerate research into genetic diseases, cancer, and gene therapies. Nearly 3,000 scientists across 160 countries have already used it.

Google DeepMind has introduced AlphaGenome, an AI model designed to decode the vast stretches of DNA that have long puzzled scientists
1
. The tool analyzes up to 1 million base pairs of DNA sequence at once, predicting how single-letter mutations in those stretches affect gene expression and other biological functions2
. This marks a significant leap from previous models like Borzoi, which could handle only 500,000 DNA letters2
. The human genome contains roughly 3 billion letters of genetic code, but only about 2% consists of genes that encode proteins4
. The remaining 98%—non-coding DNA once dismissed as "junk"—plays a crucial role in regulating gene activity and harbors many mutations linked to disease3
. AlphaGenome targets this dark matter of the genome, offering researchers a way to predict the molecular impact of variations across entire DNA sequences3
.Described in Nature, AlphaGenome is a deep learning model trained on human and mouse genome data
3
. The AI model takes a DNA sequence as input and delivers predictions on 11 distinct biological processes, including gene expression, RNA splicing, DNA accessibility to proteins, and chromatin structure2
. It can predict nearly 6,000 human genetic signals tied to specific functions simultaneously5
. "The genome is like the recipe of life, and really understanding 'What is the effect of changing any part of the recipe?' is what AlphaGenome sort of looks at," said Pushmeet Kohli, Google DeepMind's vice president for science1
. The model matched or outperformed state-of-the-art tools in 25 out of 26 tasks predicting the effects of genetic variations3
. It identified gene activity changes in certain cell types 14.7% better than Borzoi22
. A key advance is AlphaGenome's ability to pinpoint biologically important spots down to single-base resolution, far higher than Borzoi's 32 base-pair bins2
.AlphaGenome functions as a research tool with potential applications in diagnosing rare conditions, identifying cancer-driving mutations, and designing gene therapies
1
. The model has already been used to simulate known DNA mutations responsible for a type of leukaemia, predicting the same results observed in lab experiments3
. Dr. Gareth Hawkes from the University of Exeter is using AlphaGenome to explore how mutations alter risk for obesity and diabetes4
. Studies sequencing tens of thousands of people have identified variants linked to these conditions, often in the dark genome, and AlphaGenome helps researchers rapidly predict what those variants do so they can be tested in the lab4
. In cancer research, the tool can predict which mutations fuel tumor growth and serve as potential drug targets, distinguishing them from incidental mutations4
. The model could also accelerate understanding of diseases through genome-wide association studies and help craft better treatments for genetic diseases1
.Related Stories
AlphaGenome follows Google DeepMind's AlphaFold, an AI model that predicts protein structure from amino acid sequences—work that earned researchers the Nobel Prize in Chemistry in 2024
1
. The company also released AlphaMissense in 2023, which predicts how mutations in protein-coding regions affect gene function1
. AlphaGenome extends this work to the entire genome, unifying multiple genomics tasks under one roof3
. "Previously, the field required separate models for separate tasks," says Žiga Avsec, a research scientist leading DeepMind's genomics initiative3
. Robert Goldstone, head of genomics at the Francis Crick Institute, called AlphaGenome "a foundational, high-quality tool that turns the static code of the genome into a decipherable language for discovery"3
.Google DeepMind released a preview of AlphaGenome for non-commercial research in June last year, and nearly 3,000 scientists in 160 countries have since used it, submitting around 1 million requests daily
3
. Despite its capabilities, the model has limitations. It struggles to predict the influence of genetic alterations more than 100,000 base pairs apart and can only make predictions about DNA sequences from cell types used in training—primarily human and mouse3
. Unpublished data indicates the model has difficulty predicting how gene activity changes in individuals2
. "Even when the model makes accurate predictions, it does not always directly inform us of the underlying biological processes," notes Jian Zhou, a genomics machine learning researcher at the University of Chicago3
. Ben Lehner from the Wellcome Sanger Institute tested AlphaGenome using over half a million experiments and confirmed it performs very well, but cautioned that "AlphaGenome is far from perfect and there is still a lot of work to do"5
. Researchers suggest the next major advance will come from generating new types of data for the model to analyze2
.Summarized by
Navi
[1]
[3]
26 Jun 2025•Science and Research

20 Feb 2025•Science and Research

09 Jan 2025•Science and Research

1
Technology

2
Policy and Regulation

3
Policy and Regulation
