[2604.27747] Position-Aware Drafting for Inference Acceleration in

[2604.27747] Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation

arXiv - AI May 01, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.27747: Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation

Computer Science > Information Retrieval arXiv:2604.27747 (cs) [Submitted on 30 Apr 2026] Title:Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation Authors:Jiaju Chen, Chongming Gao, Chenxiao Fan, Haoyan Liu, Qingpeng Cai, Peng Jiang, Xiangnan He View a PDF of the paper titled Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation, by Jiaju Chen and 6 other authors View PDF HTML (experimental) Abstract:Large language model (LLM)-based generative list-wise recommendation has advanced rapidly, but decoding remains sequential and thus latency-prone. To accelerate inference without changing the target distribution, speculative decoding (SD) uses a small draft model to propose several next tokens at once and a target LLM to verify and accept the longest prefix, skipping multiple steps per round. In generative recommendation, however, each item is represented by multiple semantic-ID tokens, often with separators, and current drafts typically treat these tokens uniformly. This overlooks two practical facts: (i) a token's semantics depend on its within-item slot, and (ii) uncertainty tends to increase with speculation depth. Without modeling these effects, SD's speedups can be limited. We introduce PAD-Rec, Position-Aware Drafting for generative Recommendation, a lightweight module that augments the draft model with two complementary signals. Item position embeddings explicitly encode the ...

Originally published on May 01, 2026. Curated by AI News.

Llms

Researchers asked ChatGPT, Gemini and Claude which jobs are most exposed to AI. The chatbots wildly diagree

A study reveals that AI models disagree on which jobs are most vulnerable to automation, highlighting the unreliability of AI-generated e...

AI Tools & Products · 4 min · about 6 hours ago

Llms

I stopped treating ChatGPT like Google — and everything suddenly clicked

I stopped using ChatGPT like Google and started treating it like a thinking partner — here’s why that simple shift made the AI dramatical...

AI Tools & Products · 8 min · about 6 hours ago

Llms

Hackers abuse Google ads, Claude.ai chats to push Mac malware

AI Tools & Products · 6 min · about 6 hours ago

Llms

Does Claude dream of electric gavels? A federal case with Kansas connections sets an AI precedent.

AI Tools & Products · about 6 hours ago

[2604.27747] Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation

About this article

Related Articles

Researchers asked ChatGPT, Gemini and Claude which jobs are most exposed to AI. The chatbots wildly diagree

I stopped treating ChatGPT like Google — and everything suddenly clicked

Hackers abuse Google ads, Claude.ai chats to push Mac malware

Does Claude dream of electric gavels? A federal case with Kansas connections sets an AI precedent.

No comments

Stay updated with AI News