[2510.10827] Happiness is Sharing a Vocabulary: A Study of Transliteration Methods

[2510.10827] Happiness is Sharing a Vocabulary: A Study of Transliteration Methods

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2510.10827: Happiness is Sharing a Vocabulary: A Study of Transliteration Methods

Computer Science > Computation and Language arXiv:2510.10827 (cs) [Submitted on 12 Oct 2025 (v1), last revised 23 Mar 2026 (this version, v2)] Title:Happiness is Sharing a Vocabulary: A Study of Transliteration Methods Authors:Haeji Jung, Jinju Kim, Kyungjin Kim, Youjeong Roh, David R. Mortensen View a PDF of the paper titled Happiness is Sharing a Vocabulary: A Study of Transliteration Methods, by Haeji Jung and 4 other authors View PDF Abstract:Transliteration has emerged as a promising means to bridge the gap between various languages in multilingual NLP, showing promising results especially for languages using non-Latin scripts. We investigate the degree to which shared script, overlapping token vocabularies, and shared phonology contribute to performance of multilingual models. To this end, we conduct controlled experiments using three kinds of transliteration (romanization, phonemic transcription, and substitution ciphers) as well as orthography. We evaluate each model on three downstream tasks -- named entity recognition (NER), part-of-speech tagging (POS) and natural language inference (NLI) -- and find that romanization significantly outperforms other input types in 11 out of 12 evaluation settings, largely consistent with our hypothesis that it is the most effective approach. We further analyze how each factor contributed to the success, and suggest that having longer (subword) tokens shared with pre-trained languages leads to better utilization of the model. Com...

Originally published on March 25, 2026. Curated by AI News.

Related Articles

Llms

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

GPT-5.4-mini produces shorter, terser outputs by default. Vanilla accuracy dropped from 69.5% to 47.2% across 12 tasks (1,800 evals). The...

Reddit - Machine Learning · 1 min ·
Top 10 AI certifications and courses for 2026
Ai Startups

Top 10 AI certifications and courses for 2026

This article reviews the top 10 AI certifications and courses for 2026, highlighting their significance in a rapidly evolving field and t...

AI Events · 15 min ·
Hub Group Using AI, Machine Learning for Real-Time Visibility of Shipments
Machine Learning

Hub Group Using AI, Machine Learning for Real-Time Visibility of Shipments

Hub Group says it’s using artificial intelligence and machine learning to leverage data from its GPS-equipped container fleet to give cus...

AI Events · 4 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime