[2602.15373] Far Out: Evaluating Language Models on Slang in Australian and Indian English

[2602.15373] Far Out: Evaluating Language Models on Slang in Australian and Indian English

arXiv - AI 4 min read Article

Summary

This paper evaluates the performance of language models on slang in Australian and Indian English, revealing significant gaps in understanding non-standard language varieties.

Why It Matters

Understanding how language models handle slang is crucial for improving their effectiveness in diverse linguistic contexts. This research highlights the need for better model training on variety-specific language, which is essential for applications in natural language processing and AI development.

Key Takeaways

  • Language models show performance gaps in understanding slang.
  • Australian English slang is less accurately processed than Indian English slang.
  • Models perform better on real-world data compared to synthetically generated examples.
  • Target word selection tasks yield higher accuracy than prediction tasks.
  • The study underscores the importance of training models on diverse language varieties.

Computer Science > Computation and Language arXiv:2602.15373 (cs) [Submitted on 17 Feb 2026] Title:Far Out: Evaluating Language Models on Slang in Australian and Indian English Authors:Deniz Kaya Dilsiz, Dipankar Srirag, Aditya Joshi View a PDF of the paper titled Far Out: Evaluating Language Models on Slang in Australian and Indian English, by Deniz Kaya Dilsiz and 2 other authors View PDF HTML (experimental) Abstract:Language models exhibit systematic performance gaps when processing text in non-standard language varieties, yet their ability to comprehend variety-specific slang remains underexplored for several languages. We present a comprehensive evaluation of slang awareness in Indian English (en-IN) and Australian English (en-AU) across seven state-of-the-art language models. We construct two complementary datasets: \textsc{web}, containing 377 web-sourced usage examples from Urban Dictionary, and \textsc{gen}, featuring 1,492 synthetically generated usages of these slang terms, across diverse scenarios. We assess language models on three tasks: target word prediction (TWP), guided target word prediction (TWP$^*$) and target word selection (TWS). Our results reveal four key findings: (1) Higher average model performance TWS versus TWP and TWP$^*$, with average accuracy score increasing from 0.03 to 0.49 respectively (2) Stronger average model performance on \textsc{web} versus \textsc{gen} datasets, with average similarity score increasing by 0.03 and 0.05 across TWP...

Related Articles

Llms

People anxious about deviating from what AI tells them to do?

My friend came over yesterday to dye her hair. She had asked ChatGPT for the 'correct' way to do it. Chat told her to dye the ends first,...

Reddit - Artificial Intelligence · 1 min ·
Llms

What if Claude purposefully made its own code leakable so that it would get leaked

What if Claude leaked itself by socially and architecturally engineering itself to be leaked by a dumb human submitted by /u/smurfcsgoawp...

Reddit - Artificial Intelligence · 1 min ·
Llms

Observer-Embedded Reality

Observer-Embedded Reality Consciousness, Complexity, Meaning, and the Limits of Human Knowledge A Conceptual Philosophy-of-Science Paper ...

Reddit - Artificial Intelligence · 1 min ·
Llms

I think we’re about to have a new kind of “SEO”… and nobody is talking about it.

More people are asking ChatGPT things like: “what’s the best CRM?” “is this tool worth it?” “alternatives to X” And they just… trust the ...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime