[C8] MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language
Seyoung Song, Seogyeong Jeong, Eunsu Kim, Jiho Jin, Dongkwan Kim, Jay Shin, Alice Oh. Findings of the Empirical Methods in Natural Language Processing: EMNLP 2025 (EMNLP-Findings 2025, Long), 2025
One-sentence Summary: A language-agnostic framework that evaluates LLM multilingual generation by measuring task completion rates in self-communication scenarios, enabling objective assessment across 2,100+ languages without requiring language-specific tools or humans.
