[2405.10385] Augmenting Lateral Thinking in Language Models with Humor and Riddle Data for the BRAINTEASER Task

[2405.10385] Augmenting Lateral Thinking in Language Models with Humor and Riddle Data for the BRAINTEASER Task

arXiv - Machine Learning 4 min read Article

Summary

This paper explores enhancing language models' lateral thinking abilities by integrating humor and riddle datasets for the BRAINTEASER task, achieving notable accuracy improvements.

Why It Matters

The study addresses the underexplored area of lateral thinking in NLP, demonstrating how humor and riddles can improve language model performance. This has implications for developing more sophisticated AI systems capable of creative reasoning, which is vital for applications in education, entertainment, and problem-solving.

Key Takeaways

  • The BRAINTEASER task challenges models to perform creative reasoning.
  • Augmenting training data with humor and riddles significantly boosts performance.
  • Framing tasks as multiple-choice improves accuracy by 10 points.
  • Sentence-level puzzles are easier for models compared to word-level puzzles.
  • The study ranks the proposed model among the top competitors in the task.

Computer Science > Computation and Language arXiv:2405.10385 (cs) [Submitted on 16 May 2024 (v1), last revised 23 Feb 2026 (this version, v3)] Title:Augmenting Lateral Thinking in Language Models with Humor and Riddle Data for the BRAINTEASER Task Authors:Mina Ghashami, Soumya Smruti Mishra View a PDF of the paper titled Augmenting Lateral Thinking in Language Models with Humor and Riddle Data for the BRAINTEASER Task, by Mina Ghashami and 1 other authors View PDF HTML (experimental) Abstract:The SemEval 2024 BRAINTEASER task challenges language models to perform lateral thinking -- a form of creative, non-linear reasoning that remains underexplored in NLP. The task comprises two subtasks, Sentence Puzzle and Word Puzzle, requiring models to defy conventional commonsense associations. We present a system that fine-tunes DeBERTaV3 using HuggingFace's AutoModelForMultipleChoice architecture. We augment the provided training data with two additional sources: (1) a humor-style question-answering dataset generated via GPT-4 prompting, and (2) the RiddleSense dataset. This data augmentation strategy is motivated by the observation that humor and riddles share the lateral reasoning structure required by the task. Our best system achieves 92.5\% overall accuracy on the Sentence Puzzle subtask and 80.2\% on the Word Puzzle subtask, ranking 6th out of 31 teams and 10th out of 23 teams, respectively. We further show that the choice of task formulation matters: framing the problem as ...

Related Articles

Llms

"Oops! ChatGPT is Temporarily Unavailable!": A Diary Study on Knowledge Workers' Experiences of LLM Withdrawal

submitted by /u/Special-Steel [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Llms

I built a Star Trek LCARS terminal that reads your entire AI coding setup

Side project that got out of hand. It's a dashboard for Claude Code that scans your ~/.claude/ directory and renders everything as a TNG ...

Reddit - Artificial Intelligence · 1 min ·
Llms

[R] Is autoresearch really better than classic hyperparameter tuning?

We did experiments comparing Optuna & autoresearch. Autoresearch converges faster, is more cost-efficient, and even generalizes bette...

Reddit - Machine Learning · 1 min ·
Llms

Claude Source Code?

Has anyone been able to successfully download the leaked source code yet? I've not been able to find it. If anyone has, please reach out....

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime