AI in Catalan

"I have seen the professor": AI spreads mistakes in Catalan (due to Spanish influence)

A study by UPF warns that AI training in English and Spanish has consequences for the language

A user browsing GPT Chat.
26/02/2026
2 min

BarcelonaA study by Pompeu Fabra University (UPF) shows that the most popular generative artificial intelligence (AI) tools spread errors in Catalan because they are trained in Spanish, in addition to English: non-standard forms in Catalan grammar and lexicon are spread due to the influence of these global languages. The study, led by Professor Thomas Brochhagen from the UPF's Department of Translation and Language Sciences and published in the journal "Linguamática", is pioneering in demonstrating the bias towards Spanish. According to researcher Mireia Almena (UPF), "AIs not only reproduce language, but also influence its evolution and can have a much greater impact on languages like Catalan, with less written content in digital media, than on other languages with more speakers and text production capacity such as English, Spanish or Chinese." Therefore, they call on institutions to work to improve these biases. In fact, the Accent Obert Foundation has already announced that they will "take the official exams of Catalan students for the most popular AIs" to evaluate their knowledge of Catalan language and culture, in order to objectify the shortcomings in this area. Spreading Errors

The study analyzed six technological models, such as ChatGPT and Gemini, using an evaluation corpus of 160 sentences, corresponding to eight different grammatical structures that often raise doubts regarding the use of the appropriate preposition. For example, in the case of whether or not to use a preposition before a direct object. In Spanish, a preposition is used, whereas in Catalan, the general rule is not to use one. That is, we would say "he visto al profesor" in Spanish and "he vist el professor" in Catalan. They also detected errors for reasons unrelated to Spanish in sentences like "No soc gens propens d'enfadar-me [it would be "a enfadar-me"] per bajanades" (I am not at all prone to getting angry over trifles). When choosing standard or non-standard prepositions, multilingual AIs make mistakes in 55% of cases due to the influence of Spanish and in 4% of cases for other reasons, according to the study. Monolingual AIs make mistakes in 27% of cases.

stats