They create an AI to predict the effects of DNA variations on human health
Google DeepMind's new model will allow users to identify the origin of genetic diseases and develop new treatments.
Tumors can have thousands of mutations, but only a few are significant and cause disease. Understanding which ones trigger cancer is crucial for targeting precise and effective treatments. And yet, it remains a major and often insurmountable challenge in biology. This could now begin to change thanks to an AI developed by the Google DeepMind lab, which will allow for a better understanding of the genome and a more efficient and rapid interpretation of differences in DNA sequences.
They have developed a deep learning model, which they have called AlphaGenomewhich is capable of accurately predicting the function of DNA sequences of up to one million letters or base pairs and how variations in these sequences affect cells, tissues, and human health. The researchers, who They publish the model in Nature, They believe it can be a very useful tool for the scientific community to advance knowledge of the function of the human genome and genetic diseases, and that it will open the door to developing new treatments.
"It's a great example of how AI is accelerating biological discovery and the development of new therapies," says Ben Lehner, ICREA researcher at the Centre for Genomic Regulation (CRG). Speaking to the Science Media Center, This biologist believes that "identifying the exact differences between genomes that make us more or less vulnerable to developing thousands of diseases is a major step forward in developing better treatments."
The dark side of the genome
In 2003, after more than a decade of efforts, an international coalition of scientists published the first sequence of the human genome, The complete set of DNA that determines what a living organism is like and how it functions, from its appearance to its reproduction and the tasks performed by each of its constituent cells. However, although this ambitious project succeeded for the first time in obtaining the "book of life," reading and understanding it was a challenge because its grammar was not understood.
In recent years, it has been discovered, for example, that only 2% of the entire genome codes for proteins, which are the "workers" responsible for carrying out cellular tasks. The remaining 98% is non-coding, a dark and unknown part that contains many repetitive sequences, mobile elements, and regulatory DNA sequences, as explained by Lluís Montoliu, a researcher at the National Center for Biotechnology (CNB-CSIC), at the Science Media Center Spain., They are responsible for telling genes when and how to start functioning, or when and where to switch off. It's like a control panel that helps proteins be produced. And it's in these dark regions where many variations or mutations associated with diseases exist. That's why, for years, algorithms and programs have been developed to try to understand which sequences are regulatory.
"Now DeepMind has once again left us speechless with AlphaGenome and its ability to interpret and predict non-coding sequences in the genome," notes Montoliu, for whom this new AI will have "a significant impact on basic research, for understanding how genes work, and also on more practical aspects." The model, assures Natasha Latysheva, a DeepMind engineer, will boost fundamental biology, accelerate our understanding of the genome, and help us locate functional elements and their functions. To train AlphaGenome, the researchers used knowledge generated by international public initiatives such as ENCODE and GTEx, which have generated a vast amount of data on gene regulation in different tissues and conditions, and they used mouse and human genomes. One of the model's strengths is its ability to make multiple predictions simultaneously on a significant number of genetic signals associated with specific functions.
A Nobel Prize winner
A few years ago this same laboratory developed AlphaFold, an AI that could predict the 3D structure of proteins from only the DNA sequence, an advance that was recognized with the Nobel Prize in 2024DeepMind has now added AlphaGenome to its AlphaFold suite. Six months ago, it was released openly so the scientific community could begin using it for research. In that time, it has been used by three thousand scientists from 160 countries, generating around one million requests per day to advance research in areas as diverse as neurodegenerative diseases, infectious diseases, and cancer. Although the model allows for predicting molecular outcomes, DeepMind scientists emphasize that it doesn't provide the complete picture of how genetic variations lead to diseases or complex traits, because other factors, such as environmental ones, are also involved.