Genetics

Bellvitge Hospital leads an AI project to understand the human genome

Advances will allow the development of new treatments for diseases that are not currently treated.

GenevaA team including the Bellvitge Biomedical Research Institute (IDIBELL) has launched a revolutionary project to exploit human genome data with artificial intelligence. The project is part of the Structural Genomics Consortium (SGC), a global public-private consortium made up of seven universities and nine pharmaceutical companies. The goal is to facilitate the discovery of the function of many of the proteins in the genome and thus accelerate the discovery of new drugs for diseases that still do not provide treatment.

Although the human genome was sequenced more than 25 years ago, scientists still know very little about the functions of each of the genes it contains. Understanding their function is essential to understanding many of the processes that occur in our bodies and can trigger the onset of diseases.

"We don't know what 30% of genes do because no one studies them," says Albert Antolín, head of the medicinal chemistry and drug design research group at IDIBELL and one of the initiative's coordinators.

Cargando
No hay anuncios

Similarly, the chemical compounds that interact with each of these genes and allow them to be activated or inhibited are also unknown. "This consortium encourages research into little-studied proteins," adds Antolín.

A screening of thousands of proteins

To better understand the processes that take place inside cells, it is necessary to exploit the vast amount of data contained in the genome. Artificial intelligence is emerging as a fundamental tool for carrying out this task. However, in order to train AI models, a large amount of experimental data must be collected.

Cargando
No hay anuncios

"The limitation is that there is not enough data to train the models well, and they are trained with very small and fragmented data sets," says Antolín, who adds that "the objective over the next five years is to generate a huge amount of data to create more precise AI models." The project is part of a andglobal initiative called Target 2035, which aims to discover a chemical compound for every human protein by 2035.

The article with the details of the project will be published soon in the journal Nature Reviews in ChemistryUsing advanced screening techniques, the project will experimentally cross-reference more than a thousand proteins present in the human genome with billions of chemical compounds over the next five years.

"It's not enough for a chemical compound to bind to a protein; it's also necessary for this compound to be selective." The long-term goal is to perform the same process with the approximately 20,000 proteins that make up the genome. Studying the functions of a wide variety of proteins under physiological or pathological conditions would allow us to understand how to inhibit them, for example. This could have important consequences for the treatment and prevention of many types of cancer as well as neurodegenerative diseases such as Alzheimer's.

Cargando
No hay anuncios

An open science project

The Target 2035 project is part of an open science initiative aimed at facilitating the discovery of new drugs, with a special emphasis on the study of understudied proteins. The data extracted by the consortium can be used by any research center or pharmaceutical company to train their own AI models. "It's very important that fundamental science be open and that everyone can access this information," says Antolín.

This project is a collaboration between renowned public institutions and large private entities in the pharmaceutical world. "Research into many diseases requires clinical trials, which are very expensive. Public-private collaboration accelerates this process, especially in the early stages of developing a new drug," explains Antolín.

Cargando
No hay anuncios

A global network of experts supporting you

To advance the creation of powerful and accurate models, the collaboration aims to hold open competitions, where different research centers and institutions test their artificial intelligence systems. These competitions allow for direct comparison of the performance of different models and also for the collaborative exchange of ideas and information. The first competitive challenge, called Dream Challenge, is open, and teams can register and access the data. "Participating teams have very large datasets from genomic data interaction repositories to train their models. The challenge is to accurately predict the outcome of a different dataset," explains Antolín.

Among the participants in this competition are a global network of scientists expert in AI and computational chemistry called MAINFRAME, led by Antolín himself and already has more than 180 members from 43 countries. The idea behind these competitions is also to participate in the debates surrounding how to improve machine learning and AI models. "We need to get a lot of people involved in these competitions. It's the best way to learn and progress," Antolín concludes.