[2603.26511] AMALIA Technical Report: A Fully Open Source Large Language Model for European Portuguese
About this article
Abstract page for arXiv paper 2603.26511: AMALIA Technical Report: A Fully Open Source Large Language Model for European Portuguese
Computer Science > Computation and Language arXiv:2603.26511 (cs) [Submitted on 27 Mar 2026] Title:AMALIA Technical Report: A Fully Open Source Large Language Model for European Portuguese Authors:Afonso Simplício, Gonçalo Vinagre, Miguel Moura Ramos, Diogo Tavares, Rafael Ferreira, Giuseppe Attanasio, Duarte M. Alves, Inês Calvo, Inês Vieira, Rui Guerra, James Furtado, Beatriz Canaverde, Iago Paulo, Vasco Ramos, Diogo Glória-Silva, Miguel Faria, Marcos Treviso, Daniel Gomes, Pedro Gomes, David Semedo, André Martins, João Magalhães View a PDF of the paper titled AMALIA Technical Report: A Fully Open Source Large Language Model for European Portuguese, by Afonso Simpl\'icio and Gon\c{c}alo Vinagre and Miguel Moura Ramos and Diogo Tavares and Rafael Ferreira and Giuseppe Attanasio and Duarte M. Alves and In\^es Calvo and In\^es Vieira and Rui Guerra and James Furtado and Beatriz Canaverde and Iago Paulo and Vasco Ramos and Diogo Gl\'oria-Silva and Miguel Faria and Marcos Treviso and Daniel Gomes and Pedro Gomes and David Semedo and Andr\'e Martins and Jo\~ao Magalh\~aes View PDF HTML (experimental) Abstract:Despite rapid progress in open large language models (LLMs), European Portuguese (pt-PT) remains underrepresented in both training data and native evaluation, with machine-translated benchmarks likely missing the variant's linguistic and cultural nuances. We introduce AMALIA, a fully open LLM that prioritizes pt-PT by using more high-quality pt-PT data during both the mid...