European AI will not speak Catalan if it is not an official language
The inclusion of Catalan in official EU translation tools is not a problem of lack of data, but of institutional recognition.
The ability to speak Catalan in the parliament and for citizens to be able to ask questions and receive answers in this language is one of the most visible and discussed consequences of making Catalan an official language in the European Union. But despite its practical and symbolic importance, this recognition would also have advantages in technological development. In fact, it is a vital element for boosting that development.
"Official status would bring more demand and more sustained investment in linguistic tools for Catalan, such as machine translation and interpretation," says Maite Melero, who leads the Machine Translation research group at the AI Institute at the Barcelona Supercomputing Center (BSC-CNS) and is one of the leaders of the AINA project, which aims to guarantee the presence of Catalan in the digital age.
One of the tools that Catalan would incorporate would be the latest generation neural translator eTranslation, The official machine translation system of the European Commission. Currently, this system translates to and from the 24 official EU languages, as well as to Arabic, Chinese, Icelandic, Japanese, Norwegian, Russian, Turkish, and Ukrainian.
One of the essential components of this translation system is the IATE database (Interactive Terminology for Europe)which compiles all the terminology used in European institutions. It was created in 1999 and its website provides an explicit and clear answer as to why certain languages are not included: "The public version of IATE contains the 24 official languages of the European Union."
A political, not a technological, obstacle
The integration of Catalan into eTranslation is not a technological problem. "At first, they told us that to incorporate Catalan they needed large corpora of parallel data between Catalan and English. When we compiled this data in the context of AINA, the response was that the obstacle was now not technical but political," explains Melero. TERMCAT, in collaboration with the UOC, has translated the IATE terms into Catalan. The presence of our language in eTranslation is essential, because this translator is not only available to EU institutions, but is also offered free of charge to national public administrations, local and regional authorities, small and medium-sized enterprises, academia, and non-governmental organizations. Imagine, then, the impact this would have. Currently, Catalan has a solid base of digital linguistic resources. The AINA project has generated corpora—large collections of texts in digital format that allow for the training of linguistic tools—more than enough to be incorporated into eTranslate and other artificial intelligence tools. For example, AINA has created CATalog, a massive text corpus composed of more than 17.4 billion words spread across more than 34 million documents from highly diverse sources.
"The volume and diversity of Catalan corpora is far greater than those of other languages," explains Albert Cuesta, a technology journalist who conducted the study. AI in the future of non-hegemonic European languages, Recently published by the Irla, Coppietters, and Accent Obert foundations. "The corpora of Irish, Maltese, the Baltic languages, Slovenian, Slovak, Croatian, and Hungarian," he adds, "are quite biased toward legal content, as a result of their official status within the EU, which requires the translation of EU regulations."
Cuesta points out that the EU has been a pioneer in AI regulation with the Artificial Intelligence Law of June 2024. And he believes that "although the law does not directly mention languages, it can encourage the inclusion of non-hegemonic ones. The Digital Services Directive, which complements the terms and conditions in all official EU languages, provides multilingual customer support and localized interfaces."
This technology expert, ARA's usual signatureShe believes that official status, aside from the prestige it would bring, would facilitate one of the objectives of the Accent Obert Foundation, which works, among other things, to promote the presence of Catalan in the digital sphere, such as in applications and products. Last year, for example, it succeeded in getting the Chinese company BYD and Seat Cupra to incorporate it into their cars. But she emphasizes that "the implicit demand for Catalan from Catalan-speaking citizens, who configure their computers and mobile phones with Catalan as their preferred language, also works in our favor. That's why we have the campaign underway."configure.cat".
All these advances require prior research. For Melero, official status would allow for obtaining "much more parallel data and quality institutional terminology and would give a boost to research, because Catalan would always be present in evaluation standards, open resources, and European projects"—research and innovation with technological solutions that include our language. Obviously, respecting the digital rights of Catalan speakers