9 Sources
[1]
DeepMind's new AlphaGenome AI tackles the 'dark matter' in our DNA
Nearly 25 years after scientists completed a draft human genome sequence, many of its 3.1 billion letters remain a puzzle. The 98% of the genome that is not made of protein-coding genes -- but which can influence their activity -- is especially vexing. An artificial intelligence (AI) model developed by Google DeepMind in London could help scientists to make sense of this 'dark matter', and see how it might contribute to diseases such as cancer and influence the inner workings of cells. The model, called AlphaGenome, is described in a 25 June preprint. "This is one of the most fundamental problems not just in biology -- in all of science," Pushmeet Kohli, the company's head of AI for science said at a press briefing. The 'sequence to function' model takes long stretches of DNA and predicts various properties, such as the expression levels of the genes they contain and how those levels could be affected by mutations. "I think it is an exciting leap forward," says Anshul Kundaje, a computational genomicist at Stanford University in Palo Alto, California, who has had early access to AlphaGenome. "It is a genuine improvement in pretty much all current state-of-the-art sequence-to-function models." When DeepMind unveiled AlphaFold 2 in 2020, it went a long way to solving a problem that had challenged researchers for decades: determining how a protein's sequence contributes to its three-dimensional shape. Working out what DNA sequences do is different, because there is no one answer, as in a 3D structure that AlphaFold delivers. A single DNA stretch will have numerous, interconnected roles -- from attracting one set of cellular machinery to latch onto a particular section of a chromosome and turn a nearby gene into an RNA molecule, to attracting protein-transcription factors that influence where, when and to what extent gene expression occurs. Many DNA sequences, for example, influence gene activity by altering a chromosome's 3D shape, either restricting or easing access for the machinery that does the transcription. Biologists have been chipping away at this question for decades with various kinds of computational tools. In the last decade or so, scientists have developed dozens of AI models to make sense of the genome. Many of these have focused on an individual task, such as predicting levels of gene expression or determining how modular segments of individual genes, called exons, are cut-and-pasted into distinct proteins. But scientists are increasingly interested in 'all in one' tools for interpreting DNA sequences. AlphaGenome is one such model. It can take inputs of up to one million DNA letters -- a stretch that could include a gene and myriad regulatory elements -- and make thousands of predictions about numerous biological properties. In many cases, AlphaGenome's predictions are sensitive to single-DNA-letter changes, which means that scientists can predict the consequences of mutations. In one example, DeepMind researchers applied the AlphaGenome model to diverse mutations identified in previous studies in people with a type of leukaemia. The model accurately predicted that the non-coding mutations indirectly activated a nearby gene that is a common driver of this cancer. AlphaGenome was trained on genomic and other experimental data from humans and mice only. It might work as well on related organisms, but the researchers didn't test this, said Žiga Avsec, a DeepMind scientist, at the briefing. Neither was the model designed to reliably interpret an individual's genome, or to provide a full picture of how variants influence complex diseases. There is room for improvement in the accuracy of the AlphaGenome's predictions. For instance, the model struggles to identify sequences that alter the expression of a gene located more than 100,000 base pairs away. "This model has not yet 'solved' gene regulation to the same extent as AlphaFold has, for example, protein 3D structure prediction," adds Kundaje. One thing that AlphaGenome -- and similar models -- don't yet capture is how a cell's changing nature can affect how DNA sequences function, says Peter Koo, a computational biologist at Cold Spring Harbor Laboratory in New York. These models are trained to make predictions in one fixed setting, but cells are dynamic: protein levels, chemical tags on DNA, and other conditions can shift over time or between cell types -- and that can change how the same sequence behaves. Koo predicts that researchers will build on AlphaGenome by using the model to design 'regulatory' DNA sequences that allow control over when and where a gene is active, for instance, or to run virtual experiments that simulate how cells respond to genetic changes. For now, researchers doing non-commercial work can access the model through DeepMind's servers using a programming interface. A fuller release -- that would enable more-sophisticated applications -- is planned for the future.
[2]
DeepMind's latest AI tool makes sense of changes in the human genome
The human genome offers a complete instruction manual for building a person, but it's a tough read. How does a given letter among billions of DNA bases affect how the body functions? Now, DeepMind, the Google spinoff behind the artificial intelligence (AI) model AlphaFold that mastered how proteins fold into their 3D shapes, has tackled that broader challenge. Its new AI tool, AlphaGenome, reveals how simple genetic changes affect the expression of genes, altering the kinds and amounts of RNA and proteins they produce. AlphaGenome, described today in a preprint, is expected to make it easier for researchers to pinpoint the causes of diseases by more accurately tracking the consequences of genetic mutations, and help synthetic biologists design new genes from scratch. DeepMind officials say they are still working out how they will make the tool commercially available. But academic researchers can use it for free. "This is going to be an extremely useful tool," says Caleb Lareau, a systems biologist at Memorial Sloan Kettering Cancer Center who was given early access to the AI. "This is the most comprehensive attempt to annotate and explain every possible change of the 3-billion-letter sequence in the human genome. It's the strongest in silico tool we've had to date." DeepMind's latest AI builds on AlphaFold's previous success at predicting protein folding, which earned a pair of company researchers a share of last year's Nobel Prize in Chemistry. However, understanding how changes in DNA affect an organism broadly "is more of a fuzzy field," says Natasha Latysheva, a DeepMind genome researcher. A single genetic mutation can have complex, cascading effects on gene expression and how much of a protein is produced. AI developers have been tackling this complexity piece by piece. They've designed individual algorithms to search swaths of DNA for likely protein coding regions, recognize genetic patterns associated with disease, interpret genetic variations, predict disease risk, and help tailor clinical treatments. AlphaGenome unites many of these analyses and others into a single package. It relies on massive molecular biology databases produced over decades by publicly funded consortia. These include the results of experiments tracking how certain mutations in human and mouse cells affect properties such as the production of RNA, which translates the genome's blueprints into proteins, and levels of transcription factors, proteins that can turn genes on and off. By training on those data sets, AlphaGenome has learned to decipher DNA, identifying both genes and the nongene sequences that orchestrate gene activity, along with the genetic variants most likely to produce consequential changes. To use the new AI, researchers feed it a DNA sequence up to 1 million base pairs long. The model then predicts the locations where genes start and end, which can vary among different cell types. It also captures the intricacies of RNA processing -- which forms of RNA are created from those genes, and how much RNA they produce. AlphaGenome can then predict how altering a single letter of DNA affects the expression of genes and alters their RNA and protein products. Whereas other AIs can do some of this analysis for the estimated 2% of the genome in protein-coding genes, AlphaGenome is the first to manage the same feat for the full genome. "For the first time, an AI model can predict exactly where and how an RNA [variant] is expressed directly from a sequence of DNA," says Hani Goodarzi, a genomics AI model builder at the University of California San Francisco. "This allows us to see not just if a gene is expressed, but how the resulting RNA will be processed." The DeepMind team reports that AlphaGenome outperformed 22 of 24 other computer models at identifying specific features in single DNA sequences, such as coding and noncoding regions as well as transcription factor binding sites, and bested 24 of 26 models on predicting the effect of a variant on gene regulation. Researchers expect the new AI to help them pinpoint consequential genetic changes. Marc Mansour, a cancer molecular biologist at University College London, says when his lab compares the genomes of cells from patients' cancerous tissues with their unaffected cells, thousands of individual letter changes emerge. "It's very hard to work out whether any particular change will have a functional consequence," he says. AlphaGenome, he notes, ranks the variants most likely to be consequential, allowing researchers to focus their follow-up studies. That ranking capability "is hugely important for my research," adds Lareau, whose lab analyzes the effect of genetic changes on immune function. "Instead of testing hundreds of things, I can focus on a couple, having been guided to the right spot." The power to predict how genetic changes affect gene expression should be equally valuable to synthetic biologists, Latysheva adds. The AI could suggest whether newly devised genetic sequences would have beneficial effects, before further testing those effects in lab experiments. AlphaGenome's developers plan to release its underlying source code and model weights, which determine how a model generates an output, when a peer-reviewed version of the paper is published, enabling researchers to customize it for their own projects. Asked whether that could make it easier for bad actors to design bioweapons, Pushmeet Kohli, DeepMind's vice president of research, said the company shared the model with outside biosecurity experts. "We got feedback that it is quite safe, and that in releasing it, the benefits far outweigh the risks." Kohli added that DeepMind hopes to continue to expand AlphaGenome's capabilities, such as providing better insight into how genetic variations lead to complex traits or diseases. Kohli says: "What we have today is like AlphaFold 1 -- a big first step."
[3]
Google's new AI will help researchers understand how our genes work
Now Google's DeepMind division says it's made a leap in trying to understand the code with AlphaGenome, an AI model that predicts what effects small changes in DNA will have on an array of molecular processes, such as whether a gene's activity will go up or down. It's just the sort of question biologists regularly assess in lab experiments. "We have, for the first time, created a single model that unifies many different challenges that come with understanding the genome," says Pushmeet Kohli, a vice president for research at DeepMind. Five years ago, the Google AI division released AlphaFold, a technology for predicting the 3D shape of proteins. That work was honored with a Nobel Prize last year and spawned a drug-discovery spinout, Isomorphic Labs, and a boom of companies that hope AI will be able to propose new drugs. AlphaGenome is an attempt to further smooth biologists' work by answering basic questions about how changing DNA letters alters gene activity and, eventually, how genetic mutations affect our health. "We have these 3 billion letters of DNA that make up a human genome, but every person is slightly different, and we don't fully understand what those differences do," says Caleb Lareau, a computational biologist at Memorial Sloan Kettering Cancer Center who has had early access to AlphaGenome. "This is the most powerful tool to date to model that." Google says AlphaGenome will be free for noncommercial users and plans to release full details of the model in the future. According to Kohli, the company is exploring ways to "enable use of this model by commercial entities" such as biotech companies.
[4]
Google unveils new AI model to decode one million DNA letters at once
It brings base-resolution insight to long-range genomic analysis, decoding the impact of mutations with speed, scale, and unprecedented depth. The model processes up to 1 million base pairs in a single pass and predicts thousands of molecular properties, including gene expression, splicing patterns, protein-binding sites, and chromatin accessibility across diverse cell types. It's the first time such a wide range of regulatory features can be modeled jointly using one AI system. AlphaGenome's architecture first uses convolutional layers to spot short patterns in the DNA sequence, then applies transformers to share information across the entire stretch of genetic code. A final set of layers converts these learned patterns into predictions across various genomic features. During training, all computations for a single sequence are distributed across multiple interconnected Tensor Processing Units (TPUs), enabling efficient large-scale processing. A single model was trained in just four hours, using half the compute budget required for its predecessor, Enformer.
[5]
AlphaGenome: AI for better understanding the genome
Introducing a new, unifying DNA sequence model that advances regulatory variant-effect prediction and promises to shed new light on genome function -- now available via API. The genome is our cellular instruction manual. It's the complete set of DNA which guides nearly every part of a living organism, from appearance and function to growth and reproduction. Small variations in a genome's DNA sequence can alter an organism's response to its environment or its susceptibility to disease. But deciphering how the genome's instructions are read at the molecular level -- and what happens when a small DNA variation occurs -- is still one of biology's greatest mysteries. Today, we introduce AlphaGenome, a new artificial intelligence (AI) tool that more comprehensively and accurately predicts how single variants or mutations in human DNA sequences impact a wide range of biological processes regulating genes. This was enabled, among other factors, by technical advances allowing the model to process long DNA sequences and output high-resolution predictions. To advance scientific research, we're making AlphaGenome available in preview via our AlphaGenome API for non-commercial research, and planning to release the model in the future. We believe AlphaGenome can be a valuable resource for the scientific community, helping scientists better understand genome function, disease biology, and ultimately, drive new biological discoveries and the development of new treatments. Our AlphaGenome model takes a long DNA sequence as input -- up to 1 million letters, also known as base-pairs -- and predicts thousands of molecular properties characterising its regulatory activity. It can also score the effects of genetic variants or mutations by comparing predictions of mutated sequences with unmutated ones. Predicted properties include where genes start and where they end in different cell types and tissues, where they get spliced, the amount of RNA being produced, and also which DNA bases are accessible, close to one another, or bound by certain proteins. Training data was sourced from large public consortia including ENCODE, GTEx, 4D Nucleome and FANTOM5, which experimentally measured these properties covering important modalities of gene regulation across hundreds of human and mouse cell types and tissues.
[6]
Google's AlphaGenome AI Makes DNA Readable -- And It's on GitHub
Google's model is available to researchers via API, signaling a new era of more open and accessible genomics. Google DeepMind's AlphaGenome, which was announced today, isn't just another entry in the AI-for-science arms race. With API access available for non-commercial research -- and extensive documentation and community support hosted on GitHub -- it signals that genomics, once confined to specialized labs and paywalled datasets, is moving rapidly toward open science. This is a pretty big deal. Imagine your DNA is like a giant instruction manual for how your body works. For a long time, scientists could only really understand the parts that directly told your body how to build things, like proteins. But most of your DNA -- over 90% of it -- isn't like that. It doesn't build anything directly. People used to call it "junk DNA." Now we know that "junk" is actually doing something important: it helps control when and where the real instructions are used -- kind of like a control panel full of switches and dials. The problem? It's really hard to read and understand. That's where AlphaGenome comes in. AlphaGenome is a powerful AI model built by Google DeepMind that can read these confusing parts of DNA better than anything before it. It uses advanced machine learning (like the kind behind image generators or chatbots) to look at huge sections of DNA -- up to a million letters long -- and figure out which parts are important, how they affect your genes, and even how mutations might lead to disease. It's kind of like having a super-smart AI microscope that not only reads the manual, but figures out how the whole system turns on and off -- and what happens when things go wrong. What's cool is that DeepMind is sharing this tool through an API (a way for computers to talk to it), so scientists and medical researchers around the world can use it for free in their research. This means it could help speed up discoveries in things like genetic diseases, personalized medicine, and even anti-aging treatments. In short: AlphaGenome helps scientists read the parts of our DNA we didn't understand before -- and that could change everything about how we treat disease. AlphaGenome is a deep learning model designed to analyze how DNA sequences regulate gene expression and other critical functions. Unlike older models that parsed short DNA fragments, AlphaGenome can process sequences up to one million base pairs long -- an unprecedented scale that allows it to capture distant regulatory interactions missed by previous methods. AlphaGenome's core strength is its multimodal prediction engine. Unlike previous models that could predict one type of genomic activity, this model outputs high-resolution forecasts for gene expression (RNA-seq, CAGE), splicing events, chromatin states (including DNase sensitivity and histone modifications), and 3D chromatin contact maps. That makes it useful not only for pinpointing which genes are turned on or off in a cell, but for understanding the complex choreography of genome folding, editing, and accessibility. The architecture is notable, but still pretty familiar if you have been using Stable Diffusion or a normal open-source LLM locally: AlphaGenome uses a U-Net-inspired neural network, with about 450 million trainable parameters. Yes, that is pretty low if you match it against even the weak and smaller language models that work with billions of parameters. However, considering that DNA only deals with 4 bases and only two pairs -- basically the entire human genome is nothing but a combination of 3 billion pairs of A-T and C-G pairs of letters -- it is a very specific model, designed to do one single thing extremely well. The model has a sequence encoder that downsamples input from single-base resolution to coarser representations, then the transformer model layers long-range dependencies before the decoder reconstructs outputs back to the single-base level. This enables predictions at various resolutions, allowing for both fine-grained and broad regulatory analyses. The model's training relied on a wide array of publicly available datasets, including ENCODE, GTEx, 4D Nucleome, and FANTOM5 -- resources that collectively represent thousands of experimental profiles across human and mouse cell types. And this process was also quite fast: using Google's custom TPUs, DeepMind completed the pre-training and distillation process in just four hours, using half the computational budget required by its predecessor, Enformer. AlphaGenome outperformed state-of-the-art models in 22 out of 24 sequence prediction tests and 24 out of 26 variant effect predictions, a rare clean sweep in benchmarks where incremental improvements are the norm. It does the job so well, in fact, that it can compare mutated and unmutated DNA, predicting the impact of genetic variants in seconds -- a critical tool for researchers mapping disease origins. This matters, because the non-coding genome contains many of the regulatory switches that control cell function and disease risk. Models like AlphaGenome are revealing how much of human biology is governed by these previously opaque regions. AI's influence on biology today is hard to ignore. Take Ankh, a protein language model developed by teams from the Technical University of Munich, Columbia University, and the startup Protinea. Ankh treats protein sequences like language, generating new proteins and predicting their behavior -- similar to how AlphaGenome translates the regulatory "grammar" of DNA. Another adjacent tech, Nvidia's GenSLMs, demonstrates AI's ability to forecast viral mutations and cluster genetic variants for pandemic research. Meanwhile, the use of AI to foster advances in chemical and gene-based anti-aging interventions highlights the intersection of genomics, machine learning, and medicine. One of AlphaGenome's most significant contributions is its accessibility. Rather than being restricted to commercial applications, the model is available via a public API for non-commercial research. While it is not fully open sourced yet -- meaning researchers can't download and run or modify it locally -- the API and accompanying resources allow scientists worldwide to generate predictions, adapt analyses for various species or cell types, and provide feedback to shape future releases. DeepMind has signaled plans for a broader open-source release down the line. AlphaGenome's ability to analyze non-coding variants -- the area where most disease-linked mutations are found -- could unlock new understanding of genetic disorders and rare diseases. Its high-speed variant scoring also supports personalized medicine, where treatments are tailored to an individual's unique DNA profile. For now, the non-coding genome is less of a black box, and AI's role in genomics is set only to expand. AlphaGenome may not be the model to take us to Huxley's "Brave New World," but it's a clear sign of where things are headed: more data, better predictions, and a deeper understanding of how life works.
[7]
DeepMind Launches AlphaGenome to Predict How DNA Variants Affect Gene Regulation | AIM
The model could help link genetic mutations to diseases like cancer, paving the way for better treatments. DeepMind has launched AlphaGenome, a new artificial intelligence (AI) model that can predict how single DNA variants affect gene regulation across the human genome. The model, now available via API for non-commercial research, marks an advance in understanding the genome's non-coding regions, that is, areas long considered the "dark matter" of DNA. AlphaGenome can analyse up to 1 million DNA base pairs and delivers high-resolution predictions about thousands of molecular processes, such as where genes begin and end, how RNA is spliced and which proteins bind to DNA. This predictive ability, according to DeepMind, offers a "unifying model" to help scientists better understand gene function and the impact of mutations. "It's a milestone for the field," Dr Caleb Lareau of Memorial Sloan Kettering Cancer Centre said in the blog post. "For the first time, we have a single model that unifies long-range context, base-level precision and state-of-the-art performance across a whole spectrum of genomic tasks." Unlike earlier models such as Enformer and AlphaMissense, which focus primarily on protein-coding regions, AlphaGenome is designed to analyse the remaining 98% of the genome, non-coding regions that regulate gene activity and are often linked to disease. DeepMind claims the model offers a new way to explore these vast areas with unprecedented detail. The architecture combines convolutional layers to detect short patterns, transformer models to capture long-range dependencies and final layers to produce predictions. According to the company, AlphaGenome outperformed top external models in 22 of 24 sequence prediction benchmarks and matched or exceeded others in 24 of 26 variant-effect tasks. In a test case involving T-cell acute lymphoblastic leukaemia (T-ALL), AlphaGenome successfully predicted how specific mutations activate the cancer-related TAL1 gene by creating a new binding site for the MYB protein, replicating a known disease mechanism. The result underscored the model's potential to link non-coding variants to disease outcomes. "AlphaGenome will be a powerful tool for the field," Professor Marc Mansour of University College London explained in the post. "Determining the relevance of different non-coding variants can be extremely challenging, particularly to do at scale. This tool provides a crucial piece of the puzzle." DeepMind acknowledges some limitations. It still struggles with predicting the effects of very distant DNA interactions, over 1 lakh letters apart, and has not been validated for personal genome interpretation or clinical use. Researchers are invited to access AlphaGenome through its preview API and collaborate via DeepMind's community forum. The company says the model could accelerate discovery across disease research, synthetic biology, and basic science. "We hope AlphaGenome will deepen our understanding of the complex cellular processes encoded in the DNA sequence and drive exciting new discoveries in genomics and healthcare," DeepMind said in a statement.
[8]
DeepMind launches AlphaGenome to predict how DNA mutations affect genes - SiliconANGLE
DeepMind launches AlphaGenome to predict how DNA mutations affect genes Alphabet Inc.'s Google DeepMind today introduced AlphaGenome, a new artificial intelligence tool that can comprehensively predict how mutations or variants in human DNA sequences impact gene regulation. The genome is the complete set of deoxyribonucleic acid, or DNA, within a living cell, which includes all the genetic information necessary for development, growth, and functioning. In humans, the genome consists of 23 pairs of chromosomes located in the nucleus of the cell, and it regulates everything, including the response to the environment and susceptibility to disease. The new AlphaGenome model can take an extremely long DNA sequence as input -- up to 1 million letters, also known as base pairs -- and predict thousands of molecular properties. These are the recognizable letters of A, T, C and G. The properties that it can predict include where genes start and where they end in different cell types and tissues, where they get spliced and the number of proteins they produce. Proteins are the building blocks of tissues and enzymes, which are required to take action in the body. It can also tell how close they are to each other or if they're bound to other proteins. DeepMind trained the model on a considerable amount of scientific data from large public consortia that include information about gene regulation, including ENCODE, GTEx, 4D Nucleome and others. Not only can the AI model "see" a large number of DNA letters at once to make predictions about how the genes will behave, but it can also predict the resolution of individual letters. The long sequence length is crucial for covering regions of regulatory genes that are distant from the originating gene. "Previous models had to trade off sequence length and resolution, which limited the range of modalities they could jointly model and accurately predict," the DeepMind team said. This capability makes the AI model useful for predicting "splice" errors. These errors can cause rare genetic diseases like spinal muscular atrophy and some forms of cystic fibrosis. Think of DNA as the script for a training video, and ribonucleic acid, or RNA, as the raw footage. Before the final cut, the cell "edits" the RNA, removing unnecessary parts and stitching together the important scenes. But sometimes, the editing goes wrong -- key scenes are left out, or extra ones are included -- resulting in a flawed final product. These mistakes, called splice junction errors, can disrupt how the body works. According to DeepMind, AlphaGenome achieves state-of-the-art performance across a wide range of genomic prediction benchmarks, including predicting which parts of the DNA molecule will be in close proximity, whether genetic variants will increase or decrease expression of a gene or if it will change a gene-splicing pattern. "It's a milestone for the field. For the first time, we have a single model that unifies long-range context, base-level precision and state-of-the-art performance across a whole spectrum of genomic tasks," said Dr. Caleb Lareau, a researcher at Memorial Sloan Kettering Cancer Center. DeepMind said that it expects that AlphaGenome will be a powerful research tool for disease understanding by helping to accurately predict genetic disruptions. It could also be used to help guide the design of synthetic DNA with specific regulatory functions and accelerate the understanding of the genome by assisting in the understanding of its crucial functional elements.
[9]
AlphaGenome reshapes how scientists interpret mutations
A new artificial intelligence tool, AlphaGenome, has been introduced to predict how DNA sequence variations impact gene regulation, now available via API for non-commercial research. The genome functions as the cellular instruction manual, containing the complete set of DNA that directs an organism's appearance, function, growth, and reproduction. Small variations within this DNA sequence can alter an organism's environmental response or disease susceptibility. Deciphering the molecular-level reading of genomic instructions and the implications of minor DNA variations remains a significant challenge in biology. AlphaGenome is an AI tool designed to more comprehensively and accurately predict how single variants or mutations in human DNA sequences influence a broad range of biological processes that regulate genes. Technical advancements, including the model's capacity to process long DNA sequences and generate high-resolution predictions, enabled its development. AlphaGenome is currently accessible in preview through the AlphaGenome API for non-commercial research, with plans for a full release in the future. The AlphaGenome model accepts DNA sequences up to 1 million base pairs in length as input. It then predicts thousands of molecular properties that characterize the sequence's regulatory activity. The tool can also score the effects of genetic variants by comparing predictions generated from mutated sequences with those from unmutated sequences. Predicted properties encompass gene start and end locations across various cell types and tissues, splicing sites, RNA production levels, and the accessibility, proximity, or protein-binding status of DNA bases. Video: Google DeepMind Training data for AlphaGenome originated from large public consortia, including ENCODE, GTEx, 4D Nucleome, and FANTOM5. These consortia experimentally measured gene regulation properties across hundreds of human and mouse cell types and tissues, covering important modalities. The AlphaGenome architecture incorporates convolutional layers to detect short patterns within the genome sequence. Transformers facilitate information communication across all positions in the sequence. A final series of layers converts the detected patterns into predictions for different modalities. During the training process, computations are distributed across multiple interconnected Tensor Processing Units (TPUs) for single sequences. This model builds upon Enformer, a prior genomics model, and complements AlphaMissense, which specializes in categorizing variant effects within protein-coding regions. Protein-coding regions constitute 2% of the genome, while the remaining 98%, known as non-coding regions, are critical for orchestrating gene activity and contain numerous disease-linked variants. AlphaGenome provides a new perspective for interpreting these extensive sequences and the variants located within them. AlphaGenome offers several distinctive features compared to existing DNA sequence models. It analyzes up to 1 million DNA letters and produces predictions at the resolution of individual letters. This long sequence context is important for covering distant gene-regulating regions, while base-resolution is important for capturing fine-grained biological details. Previous models balanced sequence length and resolution, limiting the range of modalities they could jointly model and predict accurately. Technical advancements within AlphaGenome address this limitation without significantly increasing training resources; training a single AlphaGenome model without distillation took four hours and required half the compute budget used for the original Enformer model. By enabling high-resolution prediction for long input sequences, AlphaGenome can predict the most diverse range of modalities, providing scientists with more comprehensive information regarding the complex steps of gene regulation. In addition to predicting a diverse range of molecular properties, AlphaGenome can efficiently score the impact of a genetic variant on all these properties within a second. It accomplishes this by contrasting predictions from mutated sequences with those from unmutated ones, summarizing that contrast efficiently using different approaches for various modalities. For the first time, AlphaGenome can explicitly model the location and expression level of splice junctions directly from sequence. This offers insights into the consequences of genetic variants on RNA splicing, a process where parts of the RNA molecule are removed and remaining ends rejoined, relevant to rare genetic diseases like spinal muscular atrophy and certain forms of cystic fibrosis. AlphaGenome achieves state-of-the-art performance across a wide range of genomic prediction benchmarks. These benchmarks include predicting DNA proximity, whether a genetic variant will increase or decrease gene expression, or if it will alter a gene's splicing pattern. In producing predictions for single DNA sequences, AlphaGenome outperformed the best external models in 22 out of 24 evaluations. For predicting the regulatory effect of a variant, it matched or exceeded the top-performing external models in 24 out of 26 evaluations. These comparisons included models specialized for individual tasks. AlphaGenome was the only model capable of jointly predicting all assessed modalities, demonstrating its generality. AlphaGenome's generality allows scientists to simultaneously explore a variant's impact on multiple modalities with a single API call. This facilitates more rapid hypothesis generation and testing, eliminating the need for multiple models to investigate different modalities. AlphaGenome's strong performance indicates it has learned a general representation of DNA sequence in the context of gene regulation, providing a foundation for the wider community. Upon full release, scientists will be able to adapt and fine-tune the model on their own datasets to address specific research questions. This approach offers a flexible and scalable architecture for the future, with potential for extended capabilities, better performance, coverage of more species, or additional modalities through expanded training data. Dr. Caleb Lareau, from Memorial Sloan Kettering Cancer Center, stated, "It's a milestone for the field. For the first time, we have a single model that unifies long-range context, base-level precision and state-of-the-art performance across a whole spectrum of genomic tasks." AlphaGenome's predictive capabilities could aid several research areas. In disease understanding, it could help researchers pinpoint potential disease causes and interpret the functional impact of variants linked to traits, potentially uncovering new therapeutic targets. The model is considered suitable for studying rare variants with potentially large effects, such as those causing rare Mendelian disorders. In synthetic biology, its predictions could guide the design of synthetic DNA with specific regulatory functions, for example, activating a gene only in nerve cells but not muscle cells. In fundamental research, it could accelerate genome understanding by assisting in mapping crucial functional elements and defining their roles, identifying essential DNA instructions for regulating specific cell type functions. For example, AlphaGenome was used to investigate the potential mechanism of a cancer-associated mutation. In a study of T-cell acute lymphoblastic leukemia (T-ALL) patients, researchers observed mutations at specific genomic locations. Using AlphaGenome, it was predicted that these mutations would activate a nearby gene called TAL1 by introducing a MYB DNA binding motif. This replicated the known disease mechanism and highlighted AlphaGenome's ability to link specific non-coding variants to disease genes. Professor Marc Mansour, from University College London, commented, "AlphaGenome will be a powerful tool for the field. Determining the relevance of different non-coding variants can be extremely challenging, particularly to do at scale. This tool will provide a crucial piece of the puzzle, allowing us to make better connections to understand diseases like cancer." AlphaGenome has current limitations. Accurately capturing the influence of very distant regulatory elements, those over 100,000 DNA letters away, remains a challenge, similar to other sequence-based models. A priority for future work involves increasing the model's ability to capture cell- and tissue-specific patterns. AlphaGenome has not been designed or validated for personal genome prediction. While it can predict molecular outcomes, it does not provide a complete understanding of how genetic variations lead to complex traits or diseases, which often involve broader biological processes like developmental and environmental factors beyond the model's direct scope. Efforts are ongoing to improve the models and gather feedback to address these gaps. AlphaGenome is available for non-commercial use via the AlphaGenome API. Its predictions are intended solely for research use and have not been designed or validated for direct clinical purposes. Researchers globally are invited to communicate potential use-cases for AlphaGenome and to pose questions or provide feedback through the community forum.
Share
Copy Link
Google DeepMind unveils AlphaGenome, an AI model that predicts how DNA sequences affect gene expression and regulation, potentially revolutionizing genomic research and disease understanding.
Google DeepMind has unveiled AlphaGenome, a groundbreaking artificial intelligence (AI) model designed to tackle one of science's most fundamental challenges: decoding the 'dark matter' of our DNA. This innovative tool aims to help scientists make sense of the 98% of the human genome that doesn't code for proteins but plays a crucial role in gene regulation and cellular function 1.
Source: SiliconANGLE
AlphaGenome represents a significant leap forward in genomic research. The model can process up to one million DNA base pairs at once, predicting thousands of molecular properties that characterize regulatory activity 4. It offers insights into gene expression, splicing patterns, protein-binding sites, and chromatin accessibility across diverse cell types 5.
Pushmeet Kohli, DeepMind's head of AI for science, emphasized the model's significance: "This is one of the most fundamental problems not just in biology -- in all of science" 1.
AlphaGenome's ability to predict the consequences of single DNA letter changes sets it apart from existing models. This feature allows researchers to:
The model's architecture combines convolutional layers for detecting short DNA patterns with transformers for analyzing entire genetic code stretches. This design allows AlphaGenome to outperform 22 of 24 other computer models in identifying specific DNA features and 24 of 26 models in predicting variant effects on gene regulation 4 2.
Source: MIT Technology Review
Despite its advancements, AlphaGenome has room for improvement. The model struggles with predicting the effects of sequences on genes located more than 100,000 base pairs away. Additionally, it doesn't yet capture how a cell's changing nature affects DNA sequence function 1.
DeepMind plans to expand AlphaGenome's capabilities, aiming to provide better insights into how genetic variations lead to complex traits or diseases 2.
AlphaGenome is currently available for non-commercial research through DeepMind's API, with plans for a fuller release in the future 3. This accessibility is expected to accelerate genomic research and potentially lead to breakthroughs in understanding diseases and developing new treatments 5.
Source: Dataconomy
As the scientific community begins to explore AlphaGenome's potential, it stands poised to revolutionize our understanding of the human genome and its role in health and disease, much like its predecessor AlphaFold did for protein structure prediction.
Databricks raises $1 billion in a new funding round, valuing the company at over $100 billion. The data analytics firm plans to invest in AI database technology and an AI agent platform, positioning itself for growth in the evolving AI market.
11 Sources
Business
10 hrs ago
11 Sources
Business
10 hrs ago
SoftBank makes a significant $2 billion investment in Intel, boosting the chipmaker's efforts to regain its competitive edge in the AI semiconductor market.
22 Sources
Business
18 hrs ago
22 Sources
Business
18 hrs ago
OpenAI introduces ChatGPT Go, a new subscription plan priced at ₹399 ($4.60) per month exclusively for Indian users, offering enhanced features and affordability to capture a larger market share.
15 Sources
Technology
18 hrs ago
15 Sources
Technology
18 hrs ago
Microsoft introduces a new AI-powered 'COPILOT' function in Excel, allowing users to perform complex data analysis and content generation using natural language prompts within spreadsheet cells.
8 Sources
Technology
11 hrs ago
8 Sources
Technology
11 hrs ago
Adobe launches Acrobat Studio, integrating AI assistants and PDF Spaces to transform document management and collaboration, marking a significant evolution in PDF technology.
10 Sources
Technology
10 hrs ago
10 Sources
Technology
10 hrs ago