Research

Cultural knowledge is highly relevant for an LLM to understand a language. My main interest is to gain a deeper comprehension of the capabilities and limitations of LLMs since we cannot improve what we cannot measure. At the EPFL NLP lab, I'm currently doing research on Multilingual and Multicultural LLM Evaluation. I want to explore cultural and linguistic bias evaluation and mitigation in LLMs with a holistic approach to language understanding.

Last update: March 2026 | For up-to-date information check my Google Scholar or Semantic Scholar profiles!

Highlights

La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America

María Grandury, Javier Aula-Blasco, Júlia Falcão, Clémentine Fourrier, Miguel González Saiz, Gonzalo Martínez, Gonzalo Santamaria Gomez, Rodrigo Agerri, Nuria Aldama García, Luis Chiruzzo, Javier Conde, Helena Gomez Adorno, Marta Guerrero Nieto, Guido Ivetta, Natàlia López Fuertes, Flor Miriam Plaza-del-Arco, María-Teresa Martín-Valdivia, Helena Montoro Zamorano, Carmen Muñoz Sanz, Pedro Reviriego, Leire Rosado Plaza, Alejandro Vaca Serrano, Estrella Vallecillo-Rodríguez, Jorge Vallego, Irune Zubiaga

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vienna, Austria, 2025

1st paper at ACL Main!

Leaderboards showcase the current capabilities and limitations of Large Language Models (LLMs). To motivate the development of LLMs that represent the linguistic and cultural diversity of the Spanish-speaking community, we present La Leaderboard, the first open-source leaderboard to evaluate generative LLMs in languages and language varieties of Spain and Latin America. La Leaderboard is a community-driven project that aims to establish an evaluation standard for everyone interested in developing LLMs for the Spanish-speaking community. This initial version combines 66 datasets in Catalan, Basque, Galician, and different Spanish varieties, showcasing the evaluation results of 50 models. To encourage community-driven development of leaderboards in other languages, we explain our methodology, including guidance on selecting the most suitable evaluation setup for each downstream task. In particular, we provide a rationale for using fewer few-shot examples than typically found in the literature, aiming to reduce environmental impact and facilitate access to reproducible results for a broader research community.

The Case of Spanish as a Pluricentric Language: Challenging the Monolingual Bias in NLP to Improve Cultural Adequacy of LLMs

María Grandury and Diana Galvan-Sosa

1st Workshop on Multilingual and Equitable Language Technologies (MELT) at the Conference on Language Modeling (COLM), Montreal, Canada, 2025

Spotlight paper!

Paper
This position paper argues that the Natural Language Processing (NLP) community's oversight of Spanish's pluricentric nature undermines the development of culturally adequate models. Achieving truly effective NLP requires acknowledging the inherent cultural nuances embedded in language, yet a prevalent misconception persists that a singular ``standard Spanish'' originates primarily from Spain. Drawing on interdisciplinary insights, we believe that the distinction between ``correct'' and ``exemplary'' linguistic Spanish forms is key to effectively addressing the challenges posed by Spanish pluricentricity. This distinction allows the recognition of each Spanish-speaking nation as a distinct standardization center, where ``exemplary'' language is inherently community-defined. Maldonado applied this distinction to differentiate Spanish varieties, but with limited coverage. Motivated by these limitations, we propose a community-focused annotation framework to generate data for improving cultural adequacy in Large Language Models (LLMs), emphasizing broader engagement and contribution recognition. We then critically examine current multicultural datasets, highlighting shortcomings (e.g., limited representation, missing variation metadata), underscoring the urgent need for a more inclusive and culturally aware approach.

Published Papers

Measuring what Matters: Construct Validity in Large Language Model Benchmarks

Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou, Franziska Sofia Hafner, Harry Mayne, Jan Batzner, Negar Foroutan Eghlidi, Chris Schmitz, Karolina Korgul, (...) María Grandury (...), Luc Rocher, Adam Mahdi

Advances in Neural Information Processing Systems (NeurIPS), San Diego, CA, USA, 2025

Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong

Tairan Fu, Javier Conde, Gonzalo Martínez, María Grandury, Pedro Reviriego

The #Somos600M Project: Generating NLP resources that represent the diversity of the languages from LATAM, the Caribbean, and Spain

María Grandury

North American Chapter of the Association for Computational Linguistics Conference: LatinX in AI (LXAI) Research Workshop, Mexico City, Mexico, 2024

Evaluating large language models with tests of spanish as a foreign language: Pass or fail?

Marina Mayor-Rocher, Nina Melero, Elena Merino-Gómez, María Grandury, Javier Conde, Pedro Reviriego

BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data

Jelle Jumelet, Abdellah Fourtassi, Atsunori Haga, Björn Bunzeck, Bhavya Shandilya, (...) María Grandury (...), Arianna Bisazza, Alex Warstadt, Leshem Choshen

European Association of Computational Linguistics (EACL), Rabat, Morocco, 2026

Multicutural LLM Evaluation: The Case of Spanish as a Pluricentric Language

María Grandury

Master's Thesis, Universidad Nacional de Educación a Distancia (UNED), 2025

Spanish is not just one: A dataset of Spanish dialect recognition for LLMs

Gonzalo Martínez, Marina Mayor-Rocher, Carlos P. Huertas, Nina Melero, María Grandury, Pedro Reviriego

Data in Brief, 63, 112088, 2025

Paper

It's the same but not the same: Do LLMs distinguish Spanish varieties?

Marina Mayor-Rocher, Cristina del Pozo, Nina Melero, Gonzalo Martínez, María Grandury, Pedro Reviriego

Procesamiento del Lenguaje Natural, 75, 137-146, 2025

Paper

Apertus: Democratizing Open and Compliant LLMs for Global Language Environments

Alejandro Hernández-Cano, Andreas Hägele, Andrew H. Huang, Angelika Romanou, Arnau Solergibert, Bálint Pásztor, Benjamin Messmer, (...) María Grandury (...), Antoine Bosselut, Martin Jaggi, Imanol Schlag

Paper

Adding LLMs to the psycholinguistic norming toolbox: A practical guide to getting the most out of human ratings

Javier Conde, María Grandury, Tairan Fu, Carlos Arriaga, Gonzalo Martínez, Tom Clark, Sean Trott, Christopher Green, Pedro Reviriego, Marc Brysbaert

Paper

Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans

Javier Conde, Miguel González, María Grandury, Gonzalo Martínez, Pedro Reviriego, and Marc Brysbaert

Proceedings of the 4th Workshop on Generation, Evaluation and Metrics (GEM²), Vienna, Austria, 2025

Paper

Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation

Iván Salazar, María Fernández Burda, Sajid Bin Islam, Amir Soltani Moakhar, Shubham Singh, Fredrik Farestam, Angelika Romanou, (...) María Grandury (...)

The 14th International Conference on Learning Representations, Rio de Janeiro, Brazil, 2026

Paper

Spanish and LLM Benchmarks: Is MMLU Lost in Translation

Irene Plaza, Nina Melero, Cristina del Pozo, Javier Conde, Marina Mayor-Rocher, María Grandury, Pedro Reviriego

Proceedings of the 2nd International Generative AI and Computational Language Modelling Conference (GACLM), 2024

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, (...) María Grandury (...), Thomas Wolf

BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling

Javier De la Rosa, Eduardo G. Ponferrada, Manu Romero, Paulo Villegas, Pablo González de Prado Salas, María Grandury

Procesamiento del Lenguaje Natural, 68(0), 13–23, 2022

Guest Lectures

I've always loved teaching and I'm grateful for these opportunities to share my experience and research with the community!

RLHF & Model Alignment

RLHF & Model Alignment

National Center of Artificial Intelligence (CENIA) | Diplomado de PLN
Guest Lecture
Chile (Remote) 🇨🇱
Synthetic Data Generation and LLM Evaluation

Synthetic Data Generation and LLM Evaluation

Universidad Nacional Autónoma de México (UNAM) | Bachelor's Degree in Data Science for Social Sciences and Humanities
Guest Lecture
Mexico (Remote) 🇲🇽

Community Service

Reviewer

  • Journal Royal Society Open Science, 2026
  • Simposio LANLP: Bridging Latin American NLP, 2026
  • Workshop EXIST: sEXism Identification in Social neTworks, 2025

Diversity & Inclusion

  • Diversity & Inclusion Chair at EACL 2026
  • Birds-of-a-Feather (BoF) organizer at ACL 2025
  • Birds-of-a-Feather (BoF) organizer at COLM 2025