[2604.27747] Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation

[2604.27747] Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2604.27747: Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation

Computer Science > Information Retrieval arXiv:2604.27747 (cs) [Submitted on 30 Apr 2026] Title:Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation Authors:Jiaju Chen, Chongming Gao, Chenxiao Fan, Haoyan Liu, Qingpeng Cai, Peng Jiang, Xiangnan He View a PDF of the paper titled Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation, by Jiaju Chen and 6 other authors View PDF HTML (experimental) Abstract:Large language model (LLM)-based generative list-wise recommendation has advanced rapidly, but decoding remains sequential and thus latency-prone. To accelerate inference without changing the target distribution, speculative decoding (SD) uses a small draft model to propose several next tokens at once and a target LLM to verify and accept the longest prefix, skipping multiple steps per round. In generative recommendation, however, each item is represented by multiple semantic-ID tokens, often with separators, and current drafts typically treat these tokens uniformly. This overlooks two practical facts: (i) a token's semantics depend on its within-item slot, and (ii) uncertainty tends to increase with speculation depth. Without modeling these effects, SD's speedups can be limited. We introduce PAD-Rec, Position-Aware Drafting for generative Recommendation, a lightweight module that augments the draft model with two complementary signals. Item position embeddings explicitly encode the ...

Originally published on May 01, 2026. Curated by AI News.

Related Articles

Researchers asked ChatGPT, Gemini and Claude which jobs are most exposed to AI. The chatbots wildly diagree
Llms

Researchers asked ChatGPT, Gemini and Claude which jobs are most exposed to AI. The chatbots wildly diagree

A study reveals that AI models disagree on which jobs are most vulnerable to automation, highlighting the unreliability of AI-generated e...

AI Tools & Products · 4 min ·
I stopped treating ChatGPT like Google — and everything suddenly clicked
Llms

I stopped treating ChatGPT like Google — and everything suddenly clicked

I stopped using ChatGPT like Google and started treating it like a thinking partner — here’s why that simple shift made the AI dramatical...

AI Tools & Products · 8 min ·
Hackers abuse Google ads, Claude.ai chats to push Mac malware
Llms

Hackers abuse Google ads, Claude.ai chats to push Mac malware

AI Tools & Products · 6 min ·
Llms

Does Claude dream of electric gavels? A federal case with Kansas connections sets an AI precedent.

AI Tools & Products ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime